APIs, Microservices, and Cloud-Native

This document has now covered modern infrastructure, including container-based infrastructure available via cloud providers, and application development from waterfall, through Agile, and on to DevOps. The industry term for the culmination of all of these trends is “cloud-native”. The CNCF defines “cloud-native” as follows:

CNCF: Cloud-Native Definition

Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely-coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil CNCF 2018.

While this document does not cover specific programming languages or techniques, there are leading practices for building modern applications that are notable and should be understood by all Digital Practitioners. In software construction a programming language and execution environment must be chosen, but this choice is only the start. Innumerable design choices are required in structuring software, and the quality of these choices will affect the software’s ultimate success and value.

Early computer programs tended to be “monolithic”; that is, they were often built as one massive set of logic and their internal structure might be very complex and hard to understand. (In fact, considerable research has been performed on the limitations of human comprehension when faced with software systems of high complexity.) Monolithic programs also did not lend themselves to reuse, and therefore the same logic might need to be written over and over again. The use of “functions” and re-usable “libraries” became commonplace, so that developers were not continuously rewriting the same thing.

Two of the most critical concepts in software design are coupling and cohesion. In one of the earliest discussions of coupling, Ed Yourdon states: “Coupling [is] the probability that in coding, debugging, or modifying one module, a programmer will have to take into account something about another module. If two modules are highly coupled, then there is a high probability that a programmer trying to modify one of them will have to make a change to the other. Clearly, total systems cost will be strongly influenced by the degree of coupling between modules.” [Yourdon & Constantine 1979].

This is not merely a technical concern; as Yourdon implies, highly-coupled designs will result in higher system costs.

Cohesion is the opposite idea: that a given module should do one thing and do it well. Software engineers have been taught to develop highly-cohesive, loosely-coupled systems since at least the early 1970s, and these ideas continue to underlie the most modern trends in digital systems delivery. The latest evolution is the concept of cloud-native systems, which achieve these ideals through APIs, microservices, and container-based infrastructure.

Application Programming Interfaces

Three smaller software modules may be able to do the job of one monolithic program; however, those three modules must then communicate in some form. There are a variety of ways that this can occur; for example, communication may be via a shared data store. The preferred approach, however, is that communication occur over APIs.

An API is the public entry point in and out of a software component. It can be understood as a sort of contract; it represents an expectation that if you send a message or signal in a precisely specified format to the API, you will receive a specific, expected response. For example, your online banking service can be seen as having an API. You send it your name, password, and an indication that you want your account balance, and it will return your account balance. In pseudocode, the API might look like:

`GetAccountBalance(user_name, password, account_number) returns amount`

The modern digital world runs on APIs; they are pervasive throughout digital interactions. They operate at different levels of the digital stack; your bank balance request might be transmitted by Hypertext Transfer Protocol (HTTP), which is a lower-level set of APIs for all web traffic. At scale, APIs present a challenge of management: how do you cope with thousands of APIs? Mechanisms must be created for standardizing, inventorying, reporting on status, and many other concerns.

Microservices

APIs can be accessed in various ways. For example, a developer might incorporate a “library” in a program they are writing. The library (for example, one that supports trigonometric functions) has a set of APIs, that are only available if the library is compiled into the developer’s program and is only accessible if the program itself is running. Also, in this case, the API is dependent on the programming language; in general, a C++ library will not work in Java^®.

Increasingly, however, with the rise of the Internet, programs are continually “up” and running, and available to any other program that can communicate over the Internet, or an organization’s internal network. Programs that are continually run in this fashion, with attention to their availability and performance, are called “services”. In some cases, a program or service may only be available as a visual web page. While this is still an API of a sort, many other services are available as direct API access; no web browser is required. Rather, services are accessed through protocols such as Representational State Transfer (REST) over HTTP. In this manner, a program written in Java can easily communicate with one written in C++. This idea is not new; many organizations started moving towards Service-Oriented Architecture (SOA) in the late 1990s and early 2000s. More recently, discussions of SOA have largely been replaced by attention to microservices.

Sam Newman, in Building Microservices, provides the following definition: “Microservices are small, autonomous services that work together” [Newman 2015]. Breaking this down:

“Small” is a relative term

Newman endorses an heuristic that it should be possible to rewrite the microservice in two weeks. Matthew Skelton and Manuel Pais in Team Topologies: Organizing Business and Technology Teams for Fast Flow [Skelton & Pais 2019] emphasize that optimally-sized teams have an upper bound to their “cognitive capacity”; this is also a pragmatic limit on the size of an effective microservice.
“Autonomous” means “loosely-coupled” as discussed above, both in terms of the developer’s perspective as well as the operational perspective

Each microservice runs independently, typically on its own virtual machines or containers.

Newman observes that microservices provide the following benefits:

Technology flexibility: as noted above, microservices may be written in any language and yet communicate over common protocols and be managed in a common framework
Resilience: failure of one microservice should not result in failure of an entire digital product
Scalability: monolithic applications typically must be scaled as one unit; with microservices, just those units under higher load can have additional capacity allocated
Ease of deployment: because microservices are small and loosely-coupled, change is less risky; see The DevOps Consensus as Systems Thinking
Organizational alignment: large, monolithic codebases often encounter issues with unclear ownership; microservices are typically each owned by one team, although this is not a hard and fast rule
Composability: microservices can be combined and re-combined (“mashed up”) in endless variations, both within and across organizational boundaries; an excellent example of this is Google Maps™, which is widely used by many other vendors (e.g., Airbnb™, Lyft™) when location information is needed
Replaceability: because they are loosely-coupled and defined by their APIs, a microservice can be replaced without replacing the rest of a broader system; for example, a microservice written in Java can be replaced by one written in Go, as long as the APIs remain identical

The Twelve-Factor App

A number of good practices are associated with microservices success. One notable representation of this broader set of concerns is known as the “twelve-factor app”.

The twelve-factor app is a methodology for building SaaS apps that:

Use declarative formats for setup automation, to minimize time and cost for new developers joining the project
Have a clean contract with the underlying OS, offering maximum portability between execution environments
Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration
Minimize divergence between development and production, enabling continuous deployment for maximum agility
Can scale up without significant changes to tooling, architecture, or development practices

The twelve-factor methodology can be applied to apps written in any programming language, and which use any combination of backing services (database, queue, memory cache, etc.).

Excerpts from the Twelve-Factor App Website

Codebase: One codebase tracked in revision control, many deploys

A copy of the revision tracking database is known as a code repository, often shortened to code repo or just repo … A codebase is any single repo (in a centralized revision control system like Subversion), or any set of repos who share a root commit (in a decentralized revision control system like Git). Twelve-factor principles imply continuous integration.
Dependencies: Explicitly declare and isolate dependencies

A twelve-factor app never relies on implicit existence of system-wide packages. It declares all dependencies, completely and exactly, via a dependency declaration manifest. Furthermore, it uses a dependency isolation tool during execution to ensure that no implicit dependencies “leak in” from the surrounding system. The full and explicit dependency specification is applied uniformly to both production and development.
Configuration management: Store config in the environment

An app’s config is everything that is likely to vary between deploys (staging, production, developer environments, etc.). Apps sometimes store config as constants in the code. This is a violation of twelve-factor, which requires strict separation of config from code. Config varies substantially across deploys, code does not. Typical configuration values include server or hostnames, database names and locations, and (critically) authentication and authorization information (e.g., usernames and passwords, or private/public keys).

Backing services: Treat backing services as attached resources

A backing service [aka a resource] is any service the app consumes over the network as part of its normal operation. Examples include data stores (such as MySQL^® or CouchDB^®), messaging/queueing systems (such as RabbitMQ^® or Beanstalkd), SMTP services for outbound email (such as Postfix), and caching systems (such as Memcached).

In addition to these locally-managed services, the app may also have services provided and managed by third parties. The code for a twelve-factor app makes no distinction between local and third-party services. To the app, both are attached resources, accessed via a URL or other locator/credentials stored in the config. A deploy of the twelve-factor app should be able to swap out a local MySQL database with one managed by a third party – such as Amazon Relational Database Service (Amazon RDS) – without any changes to the app’s code only the resource handle in the config needs to change.
Build, release, run: Strictly separate build and run stages

A codebase is transformed into a (non-development) deploy through three [strictly separated] stages: The build stage is a transform which converts a code repo into an executable bundle known as a build. Using a version of the code at a commit specified by the deployment process, the build stage fetches vendor dependencies and compiles binaries and assets. The release stage takes the build produced by the build stage and combines it with the deploy’s current config. The resulting release contains both the build and the config and is ready for immediate execution in the execution environment. The run stage (also known as “runtime”) runs the app in the execution environment, by launching some set of the app’s processes against a selected release.
Processes: Execute the app as one or more stateless processes

Twelve-factor processes are stateless and share nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database. The memory space or file system of the process can be used as a brief, single-transaction cache. For example, downloading a large file, operating on it, and storing the results of the operation in the database. The twelve-factor app never assumes that anything cached in memory or on disk will be available on a future request or job – with many processes of each type running, chances are high that a future request will be served by a different process.
Port binding: Export services via port binding

Web apps are sometimes executed inside a web server container. For example, Hypertext Preprocessor (PHP) apps might run as a module inside Apache^® HTTPD, or Java apps might run inside Tomcat^®. The twelve-factor app is completely self-contained and does not rely on runtime injection of a web server into the execution environment to create a web-facing service. The web app exports HTTP as a service by binding to a port, and listening to requests coming in on that port. In a local development environment, the developer visits a service URL like http://localhost:5000/ to access the service exported by their app. In deployment, a routing layer handles routing requests from a public-facing hostname to the port-bound web processes.
Concurrency: Scale out via the process model

Any computer program, once run, is represented by one or more processes. Web apps have taken a variety of process-execution forms. For example, PHP processes run as child processes of Apache, started on-demand as needed by request volume. Java processes take the opposite approach, with the Java Virtual Machine (JVM) providing one massive [process] that reserves a large block of system resources (CPU and memory) on startup, with concurrency managed internally via threads. In both cases, the running process(es) are only minimally visible to the developers of the app. In the twelve-factor app, processes are at the highest level and take strong cues from the UNIX process model for running service daemons. Using this model, the developer can architect their app to handle diverse workloads by assigning each type of work to a process type. For example, HTTP requests may be handled by a web process, and long-running background tasks handled by a worker process.
Disposability: Maximize robustness with fast startup and graceful shutdown

The twelve-factor app’s processes are disposable, meaning they can be started or stopped at a moment’s notice. This facilitates fast elastic scaling, rapid deployment of code or config changes, and robustness of production deploys. Processes should strive to minimize startup time. Ideally, a process takes a few seconds from the time the launch command is executed until the process is up and ready to receive requests or jobs. Processes shut down gracefully when they receive a SIGTERM signal from the process manager. For a web process, graceful shutdown is achieved by ceasing to listen on the service port (thereby refusing any new requests), allowing any current requests to finish, and then exiting. For a worker process, graceful shutdown is achieved by returning the current job to the work queue. Processes should also be robust against sudden death. A twelve-factor app is architected to handle unexpected, non-graceful terminations.

Dev/prod parity: Keep development, staging, and production as similar as possible

Historically, there have been substantial gaps between development (a developer making live edits to a local deploy of the app) and production (a running deploy of the app accessed by end users). These gaps manifest in three areas:

The time gap: a developer may work on code that takes days, weeks, or even months to go into production
The personnel gap: developers write code, operations engineers deploy it

The tools gap: developers may be using a stack like NGINX™, SQLite, and OS X, while the production deploy uses Apache, MySQL, and Linux

The twelve-factor app is designed for continuous deployment by keeping [these gaps] between development and production small.

Table 1. Gaps Between Traditional and Twelve-Factor Apps
	Traditional App	Twelve-Factor App
Time Between Deploys	Weeks	Hours
*Code Authors versus* Code Deployers**	Different People	Same People
*Dev versus* Production Environments**	Divergent	As Similar as Possible

Logs

Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes and backing services. Logs in their raw form are typically a text format with one event per line (though backtraces from exceptions may span multiple lines). Logs have no fixed beginning or end, but flow continuously as long as the app is operating. A twelve-factor app never concerns itself with routing or storage of its output stream; instead, each running process writes its event stream, unbuffered, to standard output. During local development, the developer will view this stream in the foreground of their terminal to observe the app’s behavior. In staging or production deploys, each process’ stream will be captured by the execution environment, collated together with all other streams from the app, and routed to one or more final destinations for viewing and long-term archival.
Admin processes

The process formation is the array of processes that are used to do the app’s regular business (such as handling web requests) as it runs. Separately, developers will often wish to do one-off administrative or maintenance tasks for the app, such as:
- Running database migrations
- Running a console – to run arbitrary code or inspect the app’s models against the live database
- Running one-time scripts committed into the app’s repo
  
  One-off admin processes should be run in an identical environment as the regular long-running processes of the app. They run against a release, using the same codebase and config as any process run against that release. Admin code must ship with application code to avoid synchronization issues.

It is strongly recommended that the reader review the unabridged set of guidelines at 12Factor.net.

Evidence of Notability

Cloud-native approaches are, at the time of writing, receiving enormous industry attention. Kubecon (the leading conference of the CNCF) attracts wide interest, and all major cloud providers and many smaller firms participate in the ecosystem. All major cloud providers and scores of smaller firms participate in the CNCF ecosystem.

Limitations

Trillions of dollars of IT investment have been made in older architectures: bare-metal computing, tightly-coupled systems, stateful applications, and every imaginable permutation of not following guidance such as the twelve-factor methodology. The Digital Practitioner should be prepared to encounter this messy, real world.

Related Topics