In recent years there has been a marked increase in the automation of IT disaster recovery (IT DR). But how do you ensure you’re implementing IT DR automation in the right way and getting maximum value from it?
In our survey of 300 IT decision makers, 76% of enterprises with advanced automation (having a well-defined automation strategy with clear milestones that is regularly reviewed) characterized themselves as more profitable compared to others in the same industry sector. However, profitability is not the only benefit of automation. Below are some examples of ways to automate IT disaster recovery and the benefits you can expect to see from increased automation in this area.
Types of automation for IT disaster recovery
There are many different ways to automate aspects of your IT disaster recovery plans:
Automated recovery platforms
An automated recovery platform is the foundational platform that hosts your automated recovery plans. These plans include both the automated and manual activities needed to carry out a recovery and are executable for both live invocation in an actual disaster and testing recovery capabilities. They contain the live execution data following the event and are the golden source of truth for Recovery Time Actuals (RTAs) for recovery plans. Automated recovery platforms are most effective when integrated with other elements of the recovery technology stack mentioned below.
IT service management (ITSM) platforms
An ITSM platform typically has two important functions related to recovering the
technology applications and services that underpin critical business processes:
- The configuration management database (CMDB) holds the applications and services information such as location, compute resources, storage and other important details. For example, if a cloud region went down you could use the CMDB to understand which workloads were impacted and organize the recovery of those assets.
- The ticketing system is used to ensure that appropriate governance has been enacted in a recovery event and manage any configuration changes of the organization’s IT assets, e.g. moving an application to different on-premises infrastructure or the cloud for recovery. For IT DR, this is the system of record and governance.
Business continuity management (BCM) platforms
BCM platforms offer the ability for organizations to map out the set of business processes and assess the associated risks based on how critical they are to customers, which resources they require (people, places, systems, third parties), and what their impact tolerances are to ensure operations are maintained. This is typically the source of the organization’s Recovery Time Objectives (RTOs) - the target times to recover technology assets and ensure that impact tolerances are not exceeded. For recovery, the BCM platform is a key system of record for RTOs.
Infrastructure as code tools
Infrastructure as code tools, such as Ansible and Terraform, are often used as part of recovery plans to instantiate fresh infrastructure and provision applications as needed. To avoid complex sets of configuration problems, they work best as modular components integrated into the executable automated recovery plans. Interaction between these tools and the automated recovery platform is essential to avoid delays and potential revenue loss from failures.
Monitoring tools
Monitoring tools, such as Datadog and NewRelic, are used as the starting point to execute recoveries in scenarios where alarms on monitoring software are well-defined, actionable triggers for a recovery plan.
Communications tools
Communications, for example, via text, Microsoft Teams, or Slack, are used to ensure that everyone involved in the test or recovery is up to date. Communications are an important aspect of any disaster recovery and can even be integrated with an automated recovery platform.
The benefits of automated IT disaster recovery
Reduced human error
Automation reduces human error and creates greater efficiency due to a reduction in repetitive and error-prone tasks. Automating these tasks can also help teams recover with greater reliability and free people up to be more productive and focus on the tasks that people are best suited to, such as collaboration and decision making.
Removal of silos between teams
Automated disaster recovery plans, when combined with communication tools, break down silos between teams to foster seamless collaboration during a recovery. When disparate teams can communicate and share data effortlessly, information flows smoothly across departments and there is greater transparency so efforts are not duplicated and tasks are not missed.
Flexibility
Ever-changing IT threats require organizations to be flexible and adapt quickly. Automated disaster recovery systems enable swift adjustments, facilitating seamless transitions in response to an IT outage or cyber attack. This adaptability ensures that organizations are able to recover within impact tolerances and reduce negative knock-on impacts of an IT outage.
Visibility and control
Automated real-time reporting tools and dashboards for disaster recoveries help to keep teams informed about the status and progress of an IT disaster recovery. This creates more transparency and enables better decision making. Automation in disaster recovery can also be used to build repeatable and trackable processes to improve regulatory audits and governance and ensure consistency in recovery practices.
Money savings
Automating disaster recovery plans provides a measurable return on investment (ROI) due to a decrease in IT costs, creating the ability to free up funds to invest in innovation and new projects. Prolonged IT outages are also costly, both in terms of lost revenue and potential fines, so reducing downtime helps to save money and increases ROI.
How Cutover automates IT disaster recovery
The Cutover collaborative automation platform provides automated functionality in three key areas for IT disaster recovery:
Automated runbooks
Automated runbooks like Cutover’s integrate with your tech stack to orchestrate complicated IT disaster recovery workflows and procedures. Automated runbooks combine the tasks carried out by people and automated solutions to manage complex operations. Repetitive, manually-intensive tasks are automated but people still have full visibility and control and are able to make informed decisions at critical points. This type of runbook helps to standardize repetitive processes, optimize existing resources, reduce risk and improve IT disaster recovery.
Orchestration
Orchestration is the process of automating many tasks as a process or workflow to complete an entire IT-driven process such as IT disaster or cyber recovery. Cutover’s automated orchestration removes the manual burden of managing IT disaster recovery, freeing up people to perform the tasks that they do best, such as collaboration and decision making.
Integrations
In many organizations, automation happens in multiple areas at once, automating individual processes that require excessive manual effort to produce efficiency gains that are often siloed. The next step is to connect these different automated tools to reduce redundancy and ensure a joined-up process across teams.
Cutover works as a central execution platform and integrates with your existing collaboration, pipeline execution, BCM, CMDB, ticketing, and application monitoring tools via REST API to create a more seamless process for IT disaster recovery and increase productivity gains above what can be achieved by each individual tool.
Reporting and analytics
Cutover provides real-time reporting and analytics to give recovery teams the data they need to continuously improve through dashboards, post-implementation review, and an indelible audit trail.
Cutover’s reporting features provide:
- Real-time visibility and control
- The ability to pivot and make quick, informed decisions during a recovery
- Continuous improvement of internal processes
- Better regulatory compliance
Why use Cutover’s Collaborative Automation platform to automate IT disaster recovery?
Cutover’s automated runbooks enable technology teams to standardize and automate IT disaster recovery processes. The Cutover platform combines orchestration, reporting, analytics, and integrations for seamless IT disaster recovery processes across your technology stack.
Case study: Financial services organization improves IT disaster recovery and regulatory compliance
The problem: Difficulty with regulatory reporting for IT disaster recovery
A major financial services company was using an internally-built tracking tool that was over ten years old for IT DR planning and execution which did not enable the bank to standardize plans and there was no visibility of every application having a recovery plan in place. A major concern for regulators was that RTAs could be input manually up to seven days after an incident so there was no way to verify if the recorded timings were accurate. Regulators required a more robust recovery process that included observability, metadata, and metrics that were automated and verifiable, not input manually.
The solution: Collaborative Automation for IT disaster recovery
After a few events of migrating and standardizing the recovery plans the team ran its biggest recovery test to date using Cutover. In just 16 months, the bank had gotten to a point of maturity that can take years.
They ran an enterprise-wide IT DR simulation spanning all their lines of business. The event consisted of more than a thousand recovery plans and several thousand participants. Thanks to the orchestration capabilities of the Cutover platform, the client was able to execute over 16,000 individual tasks and successfully recover over a thousand applications during the 48-hour event.
Stakeholders used Cutover’s real-time dashboards to view progress throughout the event, which provided live updates and showed the actual timings as they happened rather than needing to rely on manual inputs to track RTAs vs Recovery Time Objectives (RTOs).
The outcome: Cutover drastically improved IT disaster recovery processes and reporting
For a big organization like this one, replacing an entire IT DR strategy with a new tool in a sub-two-year window is an extremely quick change. Furthermore, Cutover began providing return on investment instantly once everyone was onboarded, and those gains only continue to grow as adoption and usage increase. In contrast, the previous tool had taken four to five years to roll out, and the system before that had taken six years. This time, there was a 16-month window to fully migrate onto Cutover and the first successful event took place in the first month of active usage.
Find out how our customers have achieved a 309% ROI from Cutover