Operate Value Stream

id 4a4ff1ae 15f8 4ceb ba1a d388775f89e5
Figure 1. Operate Value Stream Model

Overview

The value stream “Operate”, as shown in Operate Value Stream Model, monitors and ensures that the availability and performance of Actual Product Instances are within the boundaries of their agreed Service Contract and Key Performance Indicator (KPI) targets. The scope of this value stream includes managing any compliance and security aspects of running Digital Product Instances and underlying systems that may result from deployment and/or fulfillment. The IT4IT Standard describes how to achieve this using integrated, automated, or fully autonomous data flows between the event monitoring (Event, Incident, Change, Configuration, and Problem Management) functions of the digital services. In addition, the data flows may be designed to be reactive, proactive, or predictive, and include a retrospective.

The Operate value stream delivering the Actual Product Instance is operated in a sustainable way within the agreed terms and conditions of a Service Contract.

Key Stakeholder

The primary stakeholder of the Operate value stream is the Consumer, who can use an Actual Product Instance or system within the boundaries of the Service Contract.

Value

The outcome of the Operate value stream is to ensure that the Actual Product Instance operates, and can be consumed as agreed in the terms of the Service Contract.

Cross-Value Stream Dependencies

The Operate value stream depends on:

  • Deploy:

    • Actual Product Instances can be enabled, changed, or disabled by the deployment of the Product Release

  • Consume:

    • Actual Product Instances can be enabled, changed, or disabled by the fulfillment of the Service Offer

    • Consumption of a Service Offer can result in an Incident that needs to be handled in the Operate value stream

Value streams that depend on the Operate value stream are:

  • Evaluate:

    • Improvements start with drivers that are expected to arise from all seven value streams

      All value streams will capture data to ensure the quality of the Digital Product will meet the requirements. Quality is an aspect of each task within each value stream. The Evaluate value stream uses data produced in all value streams to evaluate drivers.

  • Integrate:

    • Problems related to Actual Product Instances can require a fix from development (system defect)

  • Consume:

    • Fulfillment feedback of a Service Offer that resulted in an Incident and needed to be handled in the Operate value stream

The Operate value stream, as shown in Operate Value Stream Details Model, is the more comprehensive view for included scenarios and detailed value stream stages for the value stream, and will be described in detail in the following sections.

id f924a60d5d9941c6857be5acd1e616b3
Figure 2. Operate Value Stream Details Model

In the following sections we document the scenarios and stages of the Operate value stream:

Operate Scenarios

The Operate value stream is applicable for the following five scenarios:

Remediate an Identified Back-End Issue Scenario

In this scenario, the Operate value stream is triggered by an issue (a breach and/or vulnerability) or a potential future issue (triggered by a threshold breach) in the infrastructure, platforms, applications, user transactions, and/or behavior of the back-end network. The scenario ends with the outcome that the service remains operational without affecting consumer experience and within the boundaries of the Service Contract.

The activities undertaken in the scenario include the detection, diagnosis, evaluation, and resolution of issues, alerts, and anomalies related to the performance, compliance, and security aspects of the organization’s back-end.

  • Performance remediation examples include: “reactive” event detection and correlation through the monitoring of the Digital Product Instance, system, and topology; “pro-active” event detection, through periodic service restarts, test transactions, or built-in self-healing capabilities; or “predictive” event detection, done by generating test events from the analysis of historical monitoring data and other data sources (e.g., business performance, social media, weather, etc.) to identify potential degradation based on predicted consumer behavior

  • Security remediation examples include responses to different types of attack, including unauthorized system access, phishing, password cracking, spyware, malware, Denial of Service (DoS) attacks, and more

  • Compliance remediation examples include inspections to detect compliance with the rules; for example, adherence to legislation, statutes, regulations, standards, policies, contracts, processes, procedures, and operational controls

Ensure Disaster Recovery Objectives Scenario

This scenario is about ensuring that any capabilities listed in the business continuity/disaster recovery plan can work within the boundaries of the Service Contract. The activities in this scenario, which are planned and triggered by the business continuity and/or disaster recovery plan activities calendar, include: detection, diagnosis, and remediation tasks related to the plan. The purpose of the activities is to ensure that recovery objectives (e.g., Maximum Acceptable Outage (MAO), Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO)) remain within the boundaries of the agreed Service Contract and BIA. Real-world examples of these activities include performing backups, ensuring plans and mechanisms for graceful degradation, removing Single Points Of Failure (SPOFs), validating recovery libraries, performing redundancy validations, testing backup sites, and performing running (scheduled or random) disaster recovery simulations.

Remediate a major Event or Incident Scenario

This scenario is triggered by a major event (e.g., full system outage, significant application failure); its outcome is that the service is restored within the boundaries of the Service Contract. The activities undertaken in the scenario include the correlation of Events/Incidents to identify (detect) that a major Event/Incident has occurred, collaboration with multiple teams to perform research and analysis to determine the underlying cause of the issue and to establish a remediation plan (diagnose), and the execution of remediation plans by those involved to resolve the major Event/Incident and restore services.

Remediate Consumer Front-End Issue Scenario

This scenario is triggered when an issue is raised by or through the service desk. The outcome of the scenario is restored consumer experience, normal service operations, and service delivery to the consumer within the boundaries of the Service Contract. The activities of the scenario include capturing/detecting, diagnosing, resolving, and evaluating any service interactions raised and escalated by consumers or the service desk. This might include service interactions which have been generated by users or by the service desk, and have been subsequently escalated into an Incident, requiring second or third-line handling for remediation. In cases where handling is not or cannot be performed due to operational decisions or lack of capacity, the risk of working outside of agreed boundaries will be logged and communicated with the consumer and can be input into the Evaluate value stream.

Perform Scheduled Maintenance Scenario

This scenario is triggered when a planned activity is scheduled or when certain thresholds are met which trigger an alert/event for proactive maintenance. The planned activity or maintenance is reviewed and scheduled. Manual or automated action is taken to complete the activity/maintenance.

Detect Issue Stage

Description

The purpose of the value stream stage “Detect Issue” is to detect issues in an Actual Product Instance or system that has caused or may cause degradation of the service or breach the agreed service levels, then to prioritize and assign them for further diagnosis in a structured and repeatable manner.

Table 1. Detect Issue Value Stream Stage

Entrance Criteria:

  • Monitoring and analytical tooling continuously monitors an Actual Product Instance or system for issues or triggers for proactive or self-healing maintenance activities

  • The service desk identifies and raises issues based on support interactions

Exit Criteria:

  • Prioritized list of issues and/or maintenance activities assigned to a resolver group to diagnose or review

Value Item:

  • The Actual Product Instance is continuously monitored for anomalies, issues, or alerts for maintenance activities

  • The service desk supports consumers of the Actual Product Instance and assigns Incidents

Activities:

  • Should proactively monitor the Actual Product Instance to:

    • Discover the Actual Product Instance and system topology

    • Track usage to support charging and licensing

    • Detect current or potential future issues in the Actual Product Instance or system that may impact:

      • Agreed service performance levels

      • Service security

      • Compliance policies

    • Identify and report patterns and trends

    • Detect incoming issues from the service desk or reported through a self-service mechanism

  • Shall create Events and/or Incidents when defined policies or thresholds are breached

  • Should correlate Events and/or Incidents with other Events and/or Incidents to determine impact for the Digital Product Instance or system

  • Shall assign prioritized Events and/or Incidents to a resolver group considering the impact on value realization based on a complete and up-to-date overview of all managed Events or Incidents

Examples of Participating Stakeholders:

  • Compliance Manager

  • Consumer

  • Development Team

  • Security Analyst

  • Service Desk Agent

  • Support Specialist

Diagnose Issue Stage

Description

The purpose of the value stream stage “Diagnose Issue” is to prepare the best/quickest course of action to retain or restore normal operations. Further diagnosis can be performed to identify the root cause(s) of a potential or detected issue.

Table 2. Diagnose Issue Value Stream Stage

Entrance Criteria:

  • Prioritized list of issues and/or maintenance activities assigned to a resolver group to diagnose or review

Exit Criteria:

  • Validated course of action as a (first) measure for issue prevention or issue resolution is defined and assigned

  • Actual Product Instance or system problem areas identified by the problem analyst, as input for the Development team Product Backlog

Value Item:

  • Course of action defined for issue prevention or remediation and assigned to the resolution group

Activities:

  • Should understand severity, business impact, escalation paths, and find fixes by consulting the knowledge base or accessing up-to-date information from the Digital Product lifecycle

  • Should find quick fixes and/or work-arounds

  • Should identify root causes

  • Shall propose a course of action or activities needed to start, change, restore, or stop service operation levels

  • Shall approve the proposed course of action steps as a valid measure to resolve an Incident or to retain/resolve service operation

Examples of Participating Stakeholders:

  • Chief Security Officer

  • Consumer

  • Development Team

  • Risk Analyst

  • Security Analyst

  • Service Desk Agent

  • Vendor Manager

Resolve Issue Stage

Description

The purpose of the value stream stage “Resolve Issue” is to execute the (prepared) course of action to retain, restore, and confirm normal service operations.

Table 3. Resolve Issue Value Stream Stage

Entrance Criteria:

  • Validated course of action as a (first) measure for issue prevention or issue resolution is defined and assigned

Exit Criteria:

  • Resolved issue for the Actual Product Instance or system

  • Resolution acceptance is asked to the consumer

  • The consumer provides feedback based on the resolution by the team or team member

  • Updated Knowledge Item usability rating

Value Item:

  • Service level for the Actual Product Instance or system is maintained or restored

  • Consumers have not been impacted or can continue to use the Actual Product Instance or system

Activities:

  • Shall ensure everything is in place to resolve Incidents and/or related Events

  • You should collaborate on the execution of the course of action (e.g., a Pipeline or Runbook) and implementation of changes for the Actual Product Instance or system where applicable

  • Should validate success by performing post-execution checks and obtaining consumer acceptance

  • Shall close related Incidents, Interactions, and/or Events

Examples of Participating Stakeholders:

  • Chief Security Officer

  • Consumer

  • Data Protection Officer

  • Development Team

  • Security Analyst

  • Service Desk Agent

  • Vendor Manager