The problem: The need for robust cyber recovery
A multinational bank needed a better way of practicing and demonstrating its capability to recover from a pervasive cyber attack. While the regulator recognized that they had great preventative measures in place they needed to prove that they could recover quickly and effectively if an attacker penetrated those defenses.
The bank’s existing IT disaster recovery strategies were not sufficient for this task as these mainly revolved around assuming all existing data is good and failing it over from one set of infrastructure to another. In the case of a cyber attack, they would instead need to determine the last known good backup that is not infected and recover their applications from a bare metal state and repopulate with trusted data. In some cases, modern infrastructure that would be beneficial in providing automatic failover during an outage could end up replicating the problem throughout the network even faster, and cyber recovery procedures needed to reflect this risk.
The solution: Building cyber recovery runbooks
The recovery team took an approach of separating the application recovery from the data integrity recovery. It built out its platform and application recovery plans for responding to cyber incidents using Cutover runbooks. These outlined their bare metal recovery plans, initially focused on the most important tier 0 and 1 business services and the apps that support them. They plan to roll this out for all their services at a later date.
Creating these bare metal recovery plans using Cutover’s automated runbooks enabled the team to capture the complex dependencies between manual and automated tasks to ensure that the recovery would run smoothly.
The next step was to build out hundreds of recovery plans associated with data integrity to repopulate the data related to their most critical services once the application is rebuilt from a bare metal state. This was done using approved templated recovery plans in Cutover for consistency and governance. This part of cyber recovery is highly complex and not easily automated and the tasks involved can be highly dependent on the scenario so having flexibility in the building of the runbooks was key.
The application and data integrity plans can be combined to manage the entire cyber recovery process. Cutover provided a single repository view across all cyber recovery plans and gave the bank more confidence in its ability to recover.
Cutover features for cyber recovery:
- The bank used a managed repository of recovery runbooks to enable rapid mobilization spanning hundreds of applications.
- Automated runbooks were used to orchestrate the sequence of tasks and communications across human and machine activities in real time. The linked runbooks functionality enabled the team to set up a parent runbook to manage the recovery as a whole with attached child runbooks to recover each individual application.
- Real-time reporting and analytics enabled efficient control, visibility and stakeholder engagement at scale.
- Cutover’s API and integrations automated repetitive, manual tasks by linking the runbooks to any application across the recovery technology stack, such as the configuration management database (CMDB) service.
- Post-execution analytics drive continuous improvement, ensuring lessons learned are incorporated into updated recovery plans.
- Audit trail and auto-generated compliance logs and reports support audits and regulatory reporting. The template workflow also enables the creation and approval of recovery templates to ensure a consistent recovery approach that followed governance and regulatory guidelines
The outcome: Continuing the cyber recovery maturity journey
The bank foresees several outcomes from this continuing cyber recovery maturity journey:
- Faster recovery with reduced risk
- Removing regulatory burden by being able to show the regulator that they have a full suite cyber recovery plan in place and demonstrate how they would execute it in the event of a cyber attack
- Having a recovery capability that would enable them to reasonably set a recovery time objective (RTO) the way they can with IT DR