Agile and Architecture

Description

The relationship between architecture (both “enterprise” and other forms of architecture) and current Agile, DevOps, and digital product development approaches is too often troubled. However, the hope is that this document has given you a set of tools for resolving these concepts in a productive way.

The Agile Critique of Architecture

The goal of Enterprise Architecture is to act as a guide, perhaps a pathfinder, who takes the enterprise on a transformational journey – from an incoherent and complex world with LOB separation, product-specific stovepipes, legacy systems estate, and costly operation to a more rationally organized and useful state with multiservice, revenue-generating platforms and an efficient operational regime. On the way, radical surgeries may be required to eliminate duplication, reduce costs, improve reliability, and increase agility in the business. Enterprise Architecture acts as a strategic foundation for business enablement. [Bente et al. 2012]

— Bente et al.
Collaborative Enterprise Architecture: Enriching EA with Lean

Product development organizations often experience architecture and its goals as unwarranted interference, imposing a high cost of delay with little apparent return on investment. Architecture approvals can be required on:

Application designs
Database designs
Selection of technology products

and other such topics. When development cannot proceed without those approvals – or if the approvals come at the cost of expensive rework – the experience can often be challenging. Bente et al. warn: “if Enterprise Architects claim to be the only decision-making body in technical matters, there is a huge risk that they create a bottleneck … The practical consequence is that projects deliberately circumvent the Enterprise Architects …” [Bente et al. 2012].

Enterprise Architecture has presented itself as a solution to complexity, long IT time scales, business frustration, and other various IT problems. These issues are, at the time of writing, being solved, but not by architecture – at least not visibly. Instead, visible and publicized progress has come through the increasing adoption of Agile and DevOps practices rethinking open-loop, slow feedback, batch-oriented delivery. Architecture has been challenged on several fronts:

It failed to realize the emergent issue of too much enterprise work-in-process, instead championing the proliferation of enterprise processes and their associated queues
Architects’ motivation for “efficiency” and interest in capability mapping did not help the cause of cross-functional teams
- Instead, functional silos were reinforced as supply-centric “capabilities”, and the project-centric anti-pattern of “bringing the team to the work” was promoted as enterprise standard operating procedure – despite the growing evidence of Scrum and Agile success; the iterative, experimental narrative of Lean Startup did not originate from Enterprise Architecture
Despite a professed interest in systems theory, architecture has failed to adopt a workable systems perspective on digital delivery
- It did not recognize the fundamental problems of stage-gated delivery, big bang releases, queue proliferation, and so forth
  
  Architecture “gap” analysis resulted in project recommendations, again “bringing the team to the work”.
Architecture has often deserved the criticism of “top-down planning”, which in complex systems domains too often does not work
- Architects frequently fall into the trap of the HiPPO
  
  A sense of Lean Startup experimentation, of placing bets on options and testing hypotheses, is not part of the mainstream Enterprise Architecture culture. Instead, the architecture is presented as an established fact, with “governance” to ensure conformity. Hypothetical “synergies” emerging from “common platforms” are often offered as justification for architecture, with little follow-up in measuring actual value delivered.

Justifications for architecture often invoke “complexity” in the portfolio of systems. In response, architecture has often given in to the desire for a complete “radical surgery” systems re-engineering, the temptation of the “clean slate”. But as Jez Humble et al. accurately note: “A common response to getting stuck in a big ball of mud is to fund a large systems replacement project. Such projects typically take months or years before they deliver any value to users, and the switchover from the old to the new system is often performed in “big bang” fashion. These projects also run an unusually high risk of running late and over budget and being canceled. Systems re-architecture should not be done as a large program of work funded from the capital budget. It should be a continuous activity that happens as part of the product development process.” [Humble et al. 2020]

Architecture methodology, with its focus on identifying capability gaps for feeding into the project portfolio process, has perhaps been too prone to supporting these large, troubled programs. As we know from our earlier Competency Areas, large system changes are inherently risky, and any intervention into a complex system is better undertaken as a series of smaller, incremental changes with frequent monitoring and assessment.

The Architecture Critique of Agile

The Agile community has its own blind spots and challenges. Speed is seen as a good in itself, too often without an economic model. Agile teams often clash with enterprise governance processes that have sound compliance and financial benefits. Phrases like “you aren’t gonna need it” are used to justify lapses of due diligence on critical capabilities, and standard platforms and vendors are seen as unreasonable limitations on team autonomy – to the point where it seems some teams' interest is primarily in padding their resumes with as many new technologies as possible, regardless of the long-term consequences for the organization.

The Limitations of Cost of Delay

Cost of delay is a real and often overlooked issue, in understanding the net value of architecture. But it is only a factor and does not eliminate the value proposition of architecture. If the cost of delay is only a few hundred dollars a month, but the risk or technical debt represents millions, then the delay may be appropriate. Don Reinertsen, who has done more than anyone to promote the idea of cost of delay, emphasizes that all decision-making must take place within an economic framework and that means that the other architectural impact factors on organization value must also be considered [Reinertsen 2009].

Documentation

Documentation has been a core concern of the Agile movement, being mentioned in one of the four core principles of the Agile Manifesto: “Working software over comprehensive documentation” [Agile Alliance 2001].

When documentation primarily takes the form of secondary artifacts, it is appropriate to question the need for it. “The code is the documentation”, some will argue. While it is true that good coding practices result in easier-to-understand (and maintain) source code, the code cannot be the only documentation. As Ruth Malan notes: “… for systems of sufficient scope and complexity to warrant teams (of teams) working on (incremental) implementation and evolution, the sheer mass of code can make it hard to discover the essential structure from bottom-up decisions made entirely through the medium of code” [Malan & Bredemeyer 2010].

In terms of systems theory, a complex software system has emergent behavior, not obvious from just looking at its components. Because the system’s behavior cannot be reduced to its pieces, “self-documenting code” can only go so far. The behavior of the assembled components as a system needs to be represented somehow, in a way that transcends the mere mechanics of the pieces. Abstraction is necessary to understand and communicate emergent behavior, and this leads inevitably to visual representation. Without some attention to documenting overall context and systemic intent and behavior, the effectiveness of the overall human/computer system degrades. For example, Alistair Cockburn reports that the Chrysler Comprehensive Compensation project, one of the first widely reported Agile projects, was eventually halted, and: “… left no archived documentation … other than two-sentence user stories, the tests, and the code. Eventually, enough people left that the oral tradition and group memory were lost” [Cockburn 2006].

In short, failure to sustain a shared mental model of a complex system is a risk that may result in loss of that system’s value.

Sourcing and Technology Standards

Agile and DevOps are software development-centric, and have transformed that world. However, digital organizations do not always build everything. There is a complex web of supplier relationships even for organizations with robust software development capabilities, and many organizations would still prefer to “buy rather than build”. Software may be consuming the world, but that does not mean everyone employs – or should employ – software developers. Agile has not had a primary focus on sourcing, and evaluating commercial software is not a common Agile topic.

Suppose you have an idea for a digital product, and you know that you will be (at least in part) assembling complex services/products produced by others? Suppose further that these provided services overlap (the providers compete)? You need to carefully analyze which services you are going to acquire from which provider. You will need a strategy, and who is it that analyzes these services and their capabilities, interfaces, non-functional characteristics, and makes a final recommendation as to how you are going to bring them all into one unified system?

It is easy to say things like: “the teams get to define their own architecture”, but at some point, the enterprise must reckon with the cost of an overly diverse supplier base. This is a very old topic in business, not restricted to IT. At the end of the day, supplier and sourcing fragmentation costs real money. Open source, Commercial-Off-The-Shelf (COTS), cloud, in-house … the options are bewildering and require experience. In a sense, the supplier base itself is an inventory, subject to aging and spoilage. (We can consider this another way of understanding technical debt.) A consistent evaluation approach is important (preferably under an economic framework; see Reinertsen & Hubbard). And at some point, product development teams should not have to do too much of their own R&D on possible platforms for their work.

Architecture as Emergent

The Agile Manifesto is well known for saying: “The best architectures, requirements, and designs emerge from self-organizing teams” [Agile Alliance 2001]. This is one of the more frequently discussed Agile statements. Former Netflix CTO Adrian Cockcroft has expressed similar views (quote above).

A key question is whether “architecture” is considered at the single product or multi-product level. At the single product level, collaborative teams routinely develop effective software architectures. However, when multiple products are involved, it is hard to see how all the architectural value scenarios are fulfilled without some investment being directed to the goals of cross-product architectural coordination. It helps when rules of the road are established; both Amazon and Netflix have benefited from having certain widely-accepted platform standards, such as “every product communicates through APIs”. Netflix had for a long time a long-term commitment to Amazon cloud services; it was not acceptable for teams there to decide on a whim to deploy their services on Google Compute Engine™ or Microsoft Azure^®, so at least in that sense Netflix has an architecture. The question gets harder when layered products and services with complex lifecycle interactions are involved.

Microservices can reduce the need for cross-team coordination, but coordination needs still do emerge. For example, Mike Burrows of Google provides a detailed description of the Chubby lock service (Burrows 2006), which is a prototypical example of a broadly-available internal service usable by a wide variety of other products.

The purpose of a lock service is to “allow its clients to synchronize their activities and to agree on basic information about their environment”. Chubby was built from the start with objectives of reliability, availability to a “moderately large set of clients”, and ease of understanding. Burrows notes that even with such a cohesive and well-designed internal service, they still encounter coordination problems requiring human intervention. Such problems include:

Use (“abuse”) in unintended ways by clients
Invalid assumptions by clients regarding Chubby’s availability

Because of this, the Chubby team (at least at the time of writing the case study) instituted a review process when new clients wished to start using the lock manager. In terms of Competency Area 7, this means that someone on the product team needed to coordinate the discussions with the Chubby team and ensure that any concerns were resolved. This might conceivably have involved multiple iterations and reviews of designs describing intended use.

Thus, even the most sophisticated microservice environments may have a dependency on human coordination across the teams.

Towards Reconciliation

So how do we reconcile Agile with architecture practices, especially Enterprise Architecture and its concerns for longer lifecycles, aggregate technical debt, and governance? We need to understand why we look to architecture, what utilizing it means, and how it ultimately adds value, or not, in the organization.

Why: Creating the Context

One principle throughout this document has been “respect the team”, because true product value originates there. If teams are constantly fragmented and their cohesion degraded by enterprise operating models and governance mandates, their ability to creatively solve business problems is hampered. Command and control replace emergence, motivation declines, and valuable creativity is lost. Enterprise Architecture must protect the precious resource that is the high-performing, collaborative, creative team. As we have discussed, imposing multiple governance checkpoints itself adds risk. And while it is inevitable that the team will be subject to organization-wide mandates, they should be given the benefit of the doubt when autonomy collides with standardization.

When Enterprise Architecture takes on true Business Architecture questions, including how digital capabilities are to be enabled and enhanced, Agile insights become an input or kind of requirement to Business Architecture. What capabilities require high-performing, cross-functional teams? What capabilities can be supported by project-based temporary teams? And what capabilities should be outsourced? The more valuable and difficult the work, the more it calls for the careful development of a common mental model among a close-knit team over time. Driving organizational capability investment into long-running team structures becomes a strategy that organizational architects should consider as they develop the overall organizational portfolio.

Architecture adds value through constraining choices. This may seem counterintuitive, but the choice is often between re-using a known existing platform or engaging in risky R&D of alternatives. R&D costs money, and itself can impose a delay on establishing a reliable digital pipeline. But ultimately, the fundamental objective remains customer and product discovery. All other objectives are secondary; without fulfilling customer needs, architectural consistency is meaningless. Optimizing for the fast creation of product information, tested and validated against operational reality, needs to be top of mind for the architect.

What: The Architecture of Architecture, or the Digital Pipeline Itself

The digital pipeline ultimately is a finely-tuned tool for this creation of information. It, itself, has an architecture: Business, Application, and Technical. It operates within an economic framework. To understand the architecture of the digital pipeline is in a sense to understand the “architecture of architecture”.

As we have discussed above, architecture, like staff functions generally, is in part a coordination mechanism. It collects and curates knowledge and sustains the organization’s understanding of its complex systems. Architecture also identifies gaps and informs the investment process, in part through collecting feedback from the organization.

If architecture’s fundamental purpose is enabling the right emergent behavior, there are still questions about how it does so. Architecture adds value in assisting when:

Systems are too big for one team
Features are too complex to be implemented in one iteration
Features require significant organizational change management

As a coordination mechanism, it can operate in various ways including planning, controlling, and collaborating. Each may be appropriate for a given challenge or situation. For example, different approaches are required depending on whether the product challenge is flower or cog. A flower is not engineered to fill a gap. A cog is. Market-facing experiments need leeway to pivot, where initiatives intended to fill a gap in a larger system may require more constraints and control. And how do architects know there is a gap? It should be an hypothesis-driven process, that needs to establish that there is a valuable, usable, feasible future state.

How: Execution

As an executing capability, architecture operates in various ways:

Planning and analysis
Governance and approvals
Collaboration and guidance

Ideally, planning and analysis occur “upstream” of the creation of a product team. In that guise, architecture functions as a sort of zoning or planning authority – “architecture” is not a process or organization directly experienced by the product team. In this ideal, there is no conflict with product teams because once the team is formed, the architect’s job is done. However, this assumes that all the planning associated with launching a new product or capability was done correctly, and this itself is a kind of waterfall assumption. Some form of feedback and coordination is required in multi-product environments.

It is in the “governance and approval” kind of activity that conflict is most likely to emerge. Cadence and synchronization (e.g., coordination strategies) with the potential to block teams from pursuing their mission should be very carefully considered. If there is a process or a queue of architecture approvals, it at least should be operated on cost of delay of the work it is blocking. And more generally, across the organization, the process should be tested against an economic model such as establishing a nominal or portfolio-level cost of delay. Like other processes, architecture itself can be assessed against such a baseline.

Queued approvals are only one way of solving issues. A rich and under-utilized approach is using internal market-type mechanisms, where overall rules are set, and teams make autonomous decisions based on those rules. Don Reinertsen, in the Principles of Product Development Flow, discusses how Boeing implemented distributed decision-making through setting trade-off rules for cost and weight. Rather than constantly routing design approvals through a single control point, Boeing instead set the principle that project managers could “purchase” design changes up to $300 per unit, to save a pound of weight. As Reinertsen notes: “The intrinsic elegance of this approach is that the superiors didn’t actually give up control over the decision. Instead, they recognized that they could still control the decision without participating in it. They simply had to control the economic logic of the decision” [Reinertsen 2009].

One particular work product that architects often are concerned with is documentation. The desire for useful documentation, as mentioned above, reflects architecture’s goals of curating a common ground for collaboration. As Bente et al. note: “In an Agile project, explicit care must be taken to ensure proper documentation; for example, by stating it as part of the condition of satisfaction of a user story or in the definition of done” [Bente et al. 2012].

Architecture Kata

Toyota Kata was discussed in Organization and Culture. In Lean Enterprise, Jez Humble et al. argue that it can provide a useful framework for architecture objectives. Toyota Kata emphasizes end-state goals (“target conditions”) and calls for hands-on investigation and response by participating workers, not consultants or distant executives. Architecture can benefit by understanding “gaps” in the sense of Toyota’s target conditions, and then support teams in their collaborative efforts to understand and achieve the desired state. The architectural impact model can assist in thinking through suitable target conditions for architecture:

Top-line impact through reuse (lowering cost of delay)
Bottom-line impact through portfolio rationalization
Risk impact through minimizing attack surface and reuse of known good patterns and platforms

Keeping the target operating condition specific is preferable. When architecture scopes problems too broadly, the temptation is to undertake large and risky transformation programs. As an alternative, Humble suggests the “strangler pattern” [Fowler 2004]. This pattern uses as a metaphor the Australian “strangler” vines that grow around trees until the original tree dies, at which point the strangler vine is now itself a sturdy, rooted structure.

To use the strangler pattern is not to replace the system all at once, but rather to do so incrementally, replacing one feature at a time. This may seem more expensive, as it means that both the old and new systems are running (and cost savings through sunsetting the old system will be delayed). But the risk of replacing complex systems is serious and needs to be considered along with any hoped-for cost savings through replacement. Humble & Molesky 2011 suggest:

Start by delivering new functionality – at least at first
Do not attempt to port existing functionality unless it is to support a business process change
Deliver something fast
Design for testability and deployability

The strangler pattern is proven in practice. Paul Hammant [Hammant 2013] provides a large number of strangler pattern case studies, including:

Airline booking application
Energy trading application
Rail booking application

Of course, there are other ways architecture might add value beyond system replacement, in which case the strangler pattern may not be relevant. In particular, architects may be called on to closely collaborate with product teams when certain kinds of issues emerge. This is not a governance or control interaction; it is instead architecture as a form of shared consulting “bench” or coordination mechanism. Not every product team needs a full-time architect, the reasoning goes, so architects can be assigned to them on a temporary basis; e.g., for one or a few sprints, perhaps of the technical “spike” (discovery/validation/experimentation) variety.

The Challenge of the “Hands-On” Architect

Architect is a broad category. It includes individuals who are talented at single-product designs, as well as those tasked with managing the overall interactions between hundreds of systems.

The solution architect needs to have hands-on technical ability. Many Agile authors are dismissive of “ivory-tower” architects who do not do “hands-on” work and, in fact, if an architect is going to sit with a technical team as a solutions advisor they clearly need the technical skills to do so. On the other hand, not all architects operate at the solutions level, nor are the problems they face necessarily programming problems.

It is well and good for architects to maintain some technical facility, but in the case of true, portfolio-level Enterprise Architects, how to do so may not be obvious. What if someone’s portfolio includes multiple platforms and languages? It is simply not possible to be hands on in all of them. Some of the most challenging systems may be a complex mix of commercial product and customization; e.g., ERP or core banking systems. Choosing to be “hands on” may not even be welcomed by a given team, who may see it as meddlesome. And other teams may feel the architect is “playing favorites” in their choice of platform to be “hands on” with.

Clearly, if the organization is running primarily on, for example, Node.js, having strong JavaScript skills is important for the architect. But in more heterogeneous environments the architect may find strong data management skills to be more useful, as often interfaces between systems become their primary concern.

Another form of being “hands on” is maintaining good systems administration skills, so that the architect can easily experiment with new technologies. This is different from being adept in a given programming language. One recent positive trend is lightweight virtualization. In years past, experimenting with new products was difficult on two fronts:

First, obtaining high-performance computing resources capable of running demanding software
- Sometimes, these resources needed unusual OSs; e.g., “in order to try our software, you have to run it on a specific version of a well-known OS on a specific hardware platform” – not a capability most architects had in their back pocket.
Second, obtaining a demonstration version of software from vendors, who would usually start a sales cycle if you asked for it

Times have changed. Demonstration versions of software are increasingly available with little overhead or risk of triggering unwanted sales calls. Platform requirements are less diverse. And lightweight virtualization (e.g., the combination of Vagrant and Virtualbox) now makes it possible for architects to be hands on; modern laptops can run multiple virtual machines in cluster architectures. Significant experimentation can be carried out in working with systems of various characteristics. Being able to evaluate technologies in such a virtual lab setting is arguably even more useful than being a “coding architect”. Product team developers do the programming; the architect should be more concerned with the suitability and feasibility of the integrated platform.

Evaluating Architecture Outcomes

Finally, how do we evaluate architecture outcomes? If an organization adopts an experimental, Toyota Kata approach, it may find that architecture experiments run on long time horizons. Maintaining an organizational focus on value may be challenging, as the experiments do not yield results quickly. Curating a common ground of understanding may sound like a fine ideal, but how do we measure it?

First, the concept of Net Promoter Score is relevant for any service organization, internal or external. Its single question – “Based on your experience, on a scale of 1-10 would you recommend this product or service to a friend?” – efficiently encapsulates value in a single, easy to respond to query.

As digital pipelines become more automated, it may be possible to evaluate their digital exhaust impact on architecture services:

Are architecture standards evident in the source and package managers?
Are platform recommendations encountering performance or capacity challenges?

In a world of increasing connectivity and automation, there is no reason for architects in the organization to lack visibility into the consequences of their recommendations. Ultimately, if the cost of operating the coordination mechanism that is architecture exceeds the value it provides, then continuing to operate it is irrational.

Evidence of Notability

Agile, DevOps, and architecture often come into contact and even conflict. This friction carries many consequences for organizations wishing to digitally transform, yet not abandon the benefits of architecture.

Limitations

The intersection of Agile and architecture is most significant in organizations that are performing their own digital R&D (i.e., software development).

Related Topics