Continuous Architectural Refactoring

Introduction

In this chapter this document discusses the topic of continuous architectural refactoring, a topic of increasing importance in today’s Agile, DevOps-oriented software landscape. Here, the focus is on software architecture, as this is a particularly important focus for continuous architectural refactoring.

To properly discuss the concept, it must first be adequately defined, and every word in the term is important. Each will be briefly discussed, each in turn (but not in order).

Refactoring

Discussion will center around the concept of taking a system architecture and changing its structure over time. There are many reasons for doing so: design debt or “cruft” which has inevitably accumulated; changes to understanding the important non-functional requirements; remedying suboptimal architectural decisions; changes to the environment; project pivots, etc. Whatever the reason, sometimes there is a need to change fundamental aspects of how systems are put together.

Before continuing, a note on the choice of the word “refactoring”. Martin Fowler [Fowler 2019] would likely describe this topic as “restructuring”; he uses the term “refactoring” to describe small, almost impact-free changes to the codebase of a system to improve its design. This document decorates the term with the word “architectural” to make it obvious that larger-scale, structural system changes are being described.

All of which leads to the next question – what is actually meant by “architecture” in this context?

Architectural

There may be as many definitions of “architecture”, in the context of software architecture, as there are software architects to define it. Ralph Johnson of the Gang of Four [GoF 1994] defined software architecture as: “the important stuff (whatever that is)”. This deceptively obvious statement highlights the need for an architect to identify, analyze, and prioritize the non-functional requirements of a system. In this definition, the architecture could be viewed as a plan to implement these non-functional requirements. Ford [Ford 2017] provides a comprehensive list of such requirement types, or “-ilities”. The TOGAF Standard [TOGAF Standard 2018] provides a more concrete description of architecture, namely: “the structure of components, their interrelationships, and the principles and guidelines governing their design and evolution over time”.

This “evolvability” – the ability for architecture to be changed or evolved over time – is becoming critical. There are many reasons for this: the increasingly fast pace of the industry; adoption of Agile approaches at scale; the cloud-first nature of much new development; the failure of expensive, high-profile, and long-running projects, etc. System evolution has always been an important concept in architectural frameworks. Rozanski [Rozanski 2005] had an “evolution” perspective; the TOGAF Standard has the concept of “change management”. There is an increasing reluctance to worry up-front about five-year architecture plans or massive up-front architectural efforts, which is requiring organizations to consider building in “ease-of-change”. This viewpoint is in harmony with that of Martin Fowler [Fowler 2015], who argues that software architecture must address technical characteristics that are both important and hard to change.

Of course, not all architectural changes represent a refactoring. Changing functional and non-functional requirements (or an evolving understanding of the requirements) can all drive architectural change, but these kinds of architectural change are not the focus in this context.

Continuous

The industry has, over the past few years, revisited the “hard to change later” problem in a new light. Instead of looking at individual requirements from the perspective of how they will evolve in a system, what if “evolvability” was baked into the architecture as a first-class concept? Evolutionary architectures, as described by Ford [Ford 2017], have no end-state. They are designed to evolve with an ever-changing software development ecosystem, and include built-in protections around important architectural characteristics – adding time and change as first-class architectural elements. Indeed, Ford describes such an architecture as one that “supports guided, incremental change across multiple dimensions”. It is this incremental nature of change that facilitates making changes to the software architecture in a continuous manner, planning for such change from the outset and having, as part of the backlog, items which reflect the desired architectural evolution.

Planning for Continuous Architectural Refactoring

The remainder of this chapter will discuss the key considerations in planning for the continuous architectural refactoring of a software system; it answers the question of how the organization can be set up to be in a position to continuously evolve their architectures in response to changing requirements, architectural debt, and other headwinds. This document details these under three headings:

Understanding and Guiding the Architecture
Creating the Right Technical Environment
Creating the Right Non-Technical Environment

Each sub-section covers a different aspect of the necessary prerequisites for continuous architectural refactoring. Taken together, they offer a complete view of the enablers for successful continuous architectural refactoring.

Understanding and Guiding the Architecture

Before deciding what technical and organizational mechanisms to put in place to facilitate continuous refactoring, it is important to first understand the conditions under which the organizationisation is operating. Once the business and technical constraints relevant to the system have been identified, the structures that will allow it to evolve within those constraints can be put in place. Fitness functions will allow for actual testing of the architecture to ensure it is fit-for-purpose and that guardrails will contribute to the guidance referred to in the definition of architecture in Ford [Ford 2017], keeping the development teams from going astray in their system designs.

Constraints

Every organization operates under a range of constraints, which restrain the valid choices that can be made by a business in achieving its aims. They come in many guises, including financial, cultural, technical, resource-related, regulatory, political, and time-based. The very nature of the word “constraint” implies a limiting, constricting force which will choke productivity and creativity, and it is human nature to try to dismiss them or rail against them. However, constraints need not be negative forces; they can force the organization to describe the current reality, and provide guidance as to how that reality should shape efforts. Individual constraints may be long-lived, others may be eliminated through effort; but to ignore any of them is folly.

Inevitably, some of these constraints will manifest in software as architectural constraints. Technical constraints may mandate an infrastructural topology (e.g., “Organization A only deploys on Infrastructure as a Service (IaaS) Vendor B’s offerings”), an architectural style of development (e.g., “Organization C is a model-driven development software house”), or an integration requirement (e.g., “Financial transactions are always handled by System D”). Financial and resource constraints can shape software development team members and their skill sets, as well as imposing hardware and software limitations. Time-based constraints may manifest as software release cadences, which will influence development architectural choices. Regulatory constraints can have big impacts on development practices, deployment topology, and even whether development teams are allowed to continuously deploy into production.

Constraints can also take the form of business constraints, or line of business constraints in a large organization. These constraints can also shape the architecture, and, indeed, drive a decision between refactoring an existing solution or embarking on the development of a new solution. However, business constraints may also be managed by organizational changes; for example, using the Inverse Conway Maneuver.

When embarking upon a journey of continuous architectural refactoring, the identification and documentation of such constraints is vital. As Rozanski [Rozanski 2005] sagely notes, one of the first jobs for an architect is to “come up with the limits and constraints within which you will work and to ratify these with your stakeholders”.

Fitness Functions

A frequent complaint about the discipline of software architecture is that it is all too easy for teams to regard it as an academic, rather abstract endeavor. Even with relatively mature development teams, whose architectural descriptions accurately describe how the system will implement the most important non-functional requirements, it has been difficult to demonstrate that the system actually does so. Even worse, as the nature and importance of these requirements change over time, it is easy for the architectural descriptions to lag behind, with the effect being that there is no longer a shared understanding of how the system will meet its non-functional requirements. If it is not possible to test that the architecture is meeting its goals, then how can anyone ever have confidence in it? It is worth noting that the same charge can be leveled against architectural descriptions being out-of-date in the face of changing functional requirements; however, that is not a focus in this instance.

As an antidote to such problems, Ford [Ford 2017] introduces the deceptively simple concept of “fitness functions”. Fitness functions objectively assess whether the system is actually meeting its identified non-functional requirements; each fitness function tests a specific system characteristic.

For example, there is an option to have a fitness function that measures the performance of a specific API call: does the API complete in under one second at the 90th percentile? This question is far from abstract; it is an embodiment of a non-functional requirement that is testable. If an evaluation of the fitness function fails, then this aspect of the system is failing a key non-functional requirement. This is not open to opinion or subjectivity; the results speak for themselves. To take the example further, imagine that one of the proposed architectural refactorings was to implement database replication to meet availability requirements. If implemented this and the “API performance” fitness function subsequently failed, then it’s possible to know early in the development cycle that the architecture is no longer fit-for-purpose in this respect, and can addressed accordingly.

It follows, therefore, that fitness functions are key enablers of the goal to continuously restructure the architecture. They allows for those system characteristics which need to remain constant over time actually do so. They reduce the fear of breaking something inadvertently and also increase the ability to retain stakeholders confidence. They represent a physical and tangible manifestation of the constraints and architectural goals.

Guardrails

Another mechanism that organizations use to bake evolvability into their system architectures is the concept of architectural guardrails. As with their real-world roadside equivalents, software guardrails are designed to keep people from straying into dangerous territory.

In real terms, guardrails represent a lightweight governance structure. They document how an organization typically “does” things – and how, by implication, development teams are expected to “do” similar things. For example, a guardrail may document not just the specific availability requirements for a new service, but also how the organization goes about meeting such requirements. Typically, guardrails are used in combination with an external oversight team – be this an architecture board, guild, or program office. Typically, the message from such oversight teams is simple: if you stick to the guardrails, you do not need to justify your architectural choices –, they will just approve them. However, in those situations where its impossible to abide by a guardrail, then it needs to be discussed. If the reasoning is sound, then agreement can be achieved and the guardrails may be modified. However, the right is retained to require a change to the approach if there was no good reason not to abide by the guardrails.

The key to their power is that they are not mandates. They do not impose absolute bans on teams taking different approaches; rather they encourage creativity and collaboration, and encourage the evolution of the governance structure itself.

Creating the Right Technical Environment

Successful continuous architectural refactoring needs the development team to be empowered to iteratively make architectural changes. There are a number of key technical enablers for this, which are discussed here:

Continuous delivery
Componentization

In addition, Agile development practices are a key enabler for continuous architectural refactoring. As described in Architecting the Agile Transformation, there are a number of practices which are promoted by Agile working. These practices allow continuous architectural refactoring to be successfully implemented; in particular the rapid iteration and experimentation, which allows architectural evolution to be readily incorporated into ongoing development activities.

Continuous Delivery

For some years now, the concept of continuous delivery has been key to a solid foundation for software development. Fowler simply defines it as “a software development discipline where you build software in such a way that the software can be released to production at any time” [Fowler 2013]. To do this, he says, you need to continuously integrate your developed software, build it into executables, and test it via automated testing. Ideally, such testing is executed in an environment which is as close as possible to a production environment.

With a seminal work on the topic, Humble [Humble 2010] converted many software teams to the advantages of an Agile manifestation of configuration management, automated build, and continuous deployment. Most recently, Forsgren [Forsgren 2018] has statistically illustrated the advantages of continuous delivery – there is now no question that its adoption will help teams deploy on-demand, get continuous actionable feedback, and achieve one of the main principles of the Agile Manifesto [Agile Manifesto]: to “promote sustainable development”. It is, moreover, difficult to achieve scalable continuous architectural refactoring without it.

Continuous integration and continuous delivery are important elements to support continuous architectural refactoring. Continuous integration and continuous delivery are often considered as a single concept and, in many cases, are linked by a single implementation. However, this is not a requirement and, for flexibility, they will be discussed separately here.

Continuous integration is about the work of developers being frequently merged into a single branch. Some source control tooling makes this the default but, irrespective of the technology choice, it is possible to implement continuous integration with a combination of development practices and build processes. One of the most important elements of continuous integration is the integration of automated testing into the build process, so that there is confidence in the quality of the code on the main branch at all times. The key benefit in terms of architectural refactoring is the removal of “long-running” branches, which mitigate against architectural change but extend the window of potential impact of a change until all branches have merged. In practice, this can make it so cumbersome for developers to manage the impact of architectural change that it will prevent change from happening.

Continuous delivery is about being able to release at any time, which can be realized as releasing on every commit. It is important to note that in organizations with compliance, regulatory, or other mandatory checkpoints, continuous delivery may not be about a fully-automated release to production. Rather, the aim of continuous delivery should be that as each change is integrated it should be possible to release that version, and in particular that the entire team is confident that it is releasable. The key benefit in terms of architectural refactoring is in empowering the developers to make architectural changes, knowing that the combination of continuous integration and continuous delivery will guarantee that the change is compatible in terms of functionality and deployment.

It is possible, and in many cases desirable, to evolve to have a continuous integration/delivery pipeline, rather than trying to take one step to a fully automated process. The key to this is to understand the required steps in the process, and work to automate them one at a time. It is also important to look at the larger environment and make the decision to find the right solution for your organization, even if that means that some manual checkpoints remain.

Finally, it is key here to take the advice of Humble [Humble 2010] that “in software, when something is painful, the way to reduce the pain is to do it more frequently, not less”. Building toward a continuous integration/delivery pipeline is hard, but it is all the more important to do it because if you do not, the effort to deliver it manually will be all the more limiting in your evolution.

Feature Toggles

Feature toggles (or feature flags) are an important mechanism in creating an environment to allow continuous architectural refactoring. They allow features to be developed and included on the main stream (see Continuous Delivery) without exposing them to end users. This gives the development team options to factor their work solely based on their needs.

In addition, as described by Kim [Kim 2016], the key enablers arising from the use of feature toggles are the ability to:

Roll back easily to the previous version of the changed behavior
Gracefully degrade performance by allowing services/features to be selectively disabled where performance issues or constraints emerge
Increase resilience through a Service-Oriented Architecture (SOA), where calls to services can be independently enabled/disabled

Hodgson [Hodgson 2017] details the different categories of feature toggles that exist: ops toggles, permission toggles, experiment toggles, and release toggles. Ops toggles are used to control the operational behavior of the system; an example of this would be a manual implementation of the circuit breaker pattern. Permission toggles are used to control access to features that have a limited audience; an example of this would be “premium” features that would only be enabled for (higher) paying customers. Experiment toggles are typically used to support A/B testing; an example of this would be a dynamic switch in system behavior based on the current user. However, release toggles are of particular note to the discussion on continuous architectural refactoring; such toggles allow untested or incomplete refactorings and restructurings to be released into a production environment, safe in the knowledge that such code paths will never be accessed.

Componentization

The structure of your architecture can play a key role in mitigating against continuous architectural refactoring. As an organization expands, or as the need for flexibility increases, a monolithic architecture, while not inherently bad, can become a key constraint. As Kim [Kim 2016] observes: “… most DevOps organizations were hobbled by tightly-coupled, monolithic architectures that – while extremely successful at helping them achieve product/market fit – put them at risk of organizational failure once they had to operate at scale …”.

The key, therefore, is to evolve your architecture to have sufficient componentization to support your organizational evolution on an ongoing basis. The strangler pattern [Fowler 2004] can be key in this kind of evolution by creating the space for the implementation to evolve behind an unchanging API.

This can be achieved as a staged process, as described by Shoup [Shoup 2014], moving from a monolithic architecture to a layered architecture, and then on to microservices.

Creating the Right Non-Technical Environment

Technical mechanisms such as continuous delivery and feature toggles are powerful enablers of continuous architectural refactoring, but they are certainly not the only ones. For example, what if there has been no buy-in from senior management to do any refactoring? (Hint: architectural refactoring gets continuously prioritized behind functional evolution.) Even if there is such buy-in, to paraphrase the definition of architecture in Ford [Ford 2017], continuous refactoring needs to be guided and incremental. The guidance comes in the form of an architectural roadmap, a best-guess hypothesis of how the architecture needs to evolve. Finally, organizations need to balance the tensions between these forces; sometimes there should refactoring, sometimes new functionality should be built.

BIt is worth noting that the development team structure is also a key enabler for continuous architectural refactoring, in particular the Inverse Conway Maneuver. This technique has been described separately in Architecture Development.

Justifying Ongoing Investment in Architectural Refactoring

A frequent frustration amongst software developers is the perception that their management team only values things that can be sold. To management, they believe, architectural refactoring is wasted money, occupying development teams for months at a time without a single additional thing being produced that can be sold. And for that matter, why does it take so long for them to add a feature? (Possible answer: that would be because the architecture has not been refactored in years.)

Management teams do have a business to run and customers do not typically hand over money for architectural refactorings, no matter how elegant they are; so without shiny new things to sell, there may be no money to continue to employ the development teams who want to do the refactoring.

As such, this issue has two aspects: firstly, development teams need to learn how to justify such investment; and secondly, such non-functional investment will always have to be balanced with functional requirements.

It is worth, at this point, returning to Fowler’s distinction [Fowler 2019] between code refactoring and architectural restructuring. Fowler would strongly promote the view that code refactoring requires no justification; rather it is part of a developer’s “day job”. This does not mean that it is necessary to take on a massive code restructuring exercise for a legacy codebase; on the contrary, there may be no reason whatsoever to restructure the code for a stable legacy project. However, that said, developers should refactor their code when the opportunity arises. Such activity constitutes a “Type 2” decision as documented in [Ries 2011].

Architectural refactoring (restructuring), however, often requires explicit investment because the required effort is significant. In such cases, it is incumbent on development teams and architects to “sell” the refactoring in monetary, time, or customer success terms. For example, “if A is refactored, the build for Product B will be reduced by seven minutes, allowing C to b deployed more frequently per day”; or, “implementing refactoring D will directly address key Customer E’s escalated pain point; their annual subscription and support fee is $12 million per annum”. Note, however, that claims that “refactoring F will increase productivity by G%” should be avoided as software productivity is notoriously difficult to measure.

Developing an Architectural Roadmap

An architectural roadmap needs to meet several key criteria to achieve continuous architectural refactoring:

Vision: a target end-state is key to assessing individual changes in moving towards the target state
Step-wise: a number of intermediate states need to be described between the “as-is” and “to-be” architectures, documenting the benefits and challenges of each state
Flexible: the target and intermediate states may evolve as the understanding of the architecture and the constraints themselves evolve
Open: a successful architecture is rarely defined by a committee, but the process and documentation of the architectural roadmap needs to be available to the whole team, and everyone must feel empowered to comment/question
Breadth: the roadmap needs to be broad enough both in scope and planning horizon to meaningfully inform the team’s decision-making

In order to create the space for the Agile implementation, it is also important that the roadmap remains high-level. There is tension here between the need to keep the project within its constraints, and giving the team the space and support to make Agile decisions as they are implementing the architectural roadmap. Beyond the roadmap and, in particular, the vision of a Target Architecture, guardrails are key to supporting and enabling emergent architecture, while allowing the overall architecture to remain effective and meet all of its identified requirements.

In particular, the suggested aim is to create an environment where the risks of architectural change can be removed by the supporting conditions, allowing the team the freedom to make architectural changes knowing that the process and culture will support them. To quote from Kim [Kim 2013]: “Furthermore, there is hypothesis-driven culture, requiring everyone to be a scientist, taking no assumption for granted, and doing nothing without measuring.” The measuring of the impact of architectural change was discussed in Fitness Functions.

Progressive Transformation (Experience)

Delivering continuous architectural refactoring is more than the sum of the pieces already described in this section. It also needs a pragmatic approach from the entire team; what is “good enough” at every point to allow the product to evolve (in the right direction) and keep the business moving forward. Simon [Simon 2018] describes this as “liquid software”, allowing the product (and its architecture) to evolve as needed, while also having an environment that ensures it continues to meet all the requirements placed on it.

This can also have a varying focus over time; sometimes the business needs to “win” and the focus shifts to business features at the expense of architectural evolution. But it is critical that the environment for architectural evolution persists, so that if and when the focus shifts back to architecture concerns, the option to continue to evolve it will remain open.