Digital Operations

Although this Competency Area is titled “Digital Operations” it also brings in infrastructure engineering at a higher level, assuming that the product is continuing to scale up. This is consistent with industry usage.

Area Description

As the digital product gains more use, running it becomes a distinct concern from building it. For all their logic, computers are still surprisingly unreliable. Servers running well-tested software may remain “up” for weeks, and then all of a sudden hang and have to be rebooted. Sometimes it is clear why (for example, a log file filled up that no-one expected) and in other cases, there just is no explanation.

Engineering and operating complex IT-based distributed systems is a significant challenge. Even with Infrastructure as Code and automated continuous delivery pipelines, operations as a class of work is distinct from software development per se. The work is relatively more interrupt-driven, as compared to the “heads-down” focus on developing new features. Questions about scalability, performance, caching, load balancing, and so forth usually become apparent first through feedback from the operations perspective – whether or not there is a formal operations “team”.

The assumption here is still just one team with one product, but in this last Competency Area of Context II, the assumption is that there is considerable use of the product. With today’s technology, correctly deployed and operated, even a small team can support large workloads. This does not come easily, however. Systems must be designed for scale and ease of operations. They need to be monitored and managed for performance and capacity. The topic of configuration management will be covered further at a more advanced level.

The evolution of infrastructure was covered in Digital Infrastructure and applications development in Application Delivery, and the DPBoK Standard will continue to build on those foundations. The practices of change, incident, and problem management have been employed in the industry for decades and are important foundations for thinking about operations. Finally, the concept of SRE is an important new discipline emerging from the practices of companies such as Google and Facebook.