Gartner® report: 9 Principles for Improving Cloud Resilience
Download
No items found.
Blog
June 27, 2024

Strengthen cloud resilience and recovery plans: Navigate operational challenges in the cloud

Recently Cutover CEO Ky Nichol was joined by USAA Lead Cloud Resiliency Architect, Enterprise Resilience Group Saba Waheed and Desjardins Senior Director of Infrastructure & Cloud Dominic Rocheleau in a live webinar to discuss cloud resilience and recovery.

If you missed the session you can watch here. Below is a quick rundown of some of the insights shared in that session. 

Resilience isn’t automatic in the cloud

According to Dominic, in the early days of cloud adoption there were a lot of misconceptions about cloud resilience, so operational resilience strategies did not change. Many people in the industry saw the cloud as a magic technology that you could build a solution on top of and you wouldn’t have to do anything else. He noted that many cloud service providers were also pushing that narrative, but, far from being a magic bullet for resilience, the cloud can actually create more complexity and difficulty in this area: “Now you deal with regions, with different layers in the stack. So you need to be stringent about how you build your process around it. And for those who already have great processes for resiliency because they have on-premises data centers, just use the same process. Don't try to minimize the impact on resiliency that your cloud can have.”

There is a shared responsibility between cloud providers and customers

Saba emphasized the need to learn about and understand the shared responsibility model:

“We need to understand what our shared responsibility as a customer is, and what the provider's responsibility is, and make sure we have standards, policies, and guidelines defined and communicated well in the organization. The cloud has really profound capabilities, including scalability and so much flexibility that you can use to build great resilience and ensure your businesses can recover faster. There's so much automation capability that you can capitalize on as well.”

Regulators are recognizing the growing need for cloud resilience guidelines

Dominic noted that regulators are increasingly asking questions about cloud resiliency that many organizations are not yet prepared to answer. Cloud concentration risk is a key concern:

“You can build resiliency inside of a cloud provider solution to work with them to create a high level of resiliency but if your cloud provider fails tomorrow, what’s your plan? We are getting asked this question and don't yet have an answer.”

Cloud providers, organizations and regulators will have to continue to discuss these issues and work together to understand the right path forward.

Organizations must work alongside cloud providers to ensure applications are resilient in the cloud

When it comes to best practices for building resilience in the cloud, Dominic recommends adopting your cloud provider’s best practices as a starting point. Even if you can make something work, it may have detrimental consequences further down the line if it is not architected properly for the cloud. Once you have architected your applications well for the cloud, the next step is to implement a rigorous test process with your cloud provider to ensure you can recover if something goes wrong.

While, as noted above, there is a shared responsibility for cloud resilience between the cloud service provider and customer, Saba says that the vast majority of areas that are likely to fail are the responsibility of the customer. This is why testing is so important.

Integrations and automation can help streamline the recovery of cloud-based workloads

Saba notes that automation can be especially useful for testing:

“We have tools where we can simulate a failure, so first we would build a scenario manually, test it out, make sure we have our objectives in place, and that our applications are in a steady state. Once you understand the steady state of your application, you can use this for simulation tools to induce a failure and use the automation to validate whether you met the recovery objectives or not. That's one place we could integrate automation once we build that experiment and analyze the results and configurations. Then you could make that experiment into a repeatable execution and embed it into your software development lifecycle. These are the minimum set of experiments that you could automate.”

Best practices for testing protocols for important business services

Saba recommends being proactive and doing regular testing before you deploy into production. Then, after you deploy, depending on how mature you are, you could annually test your production environments and your backup and recovery procedures. Make sure you're able to restore from those backups and do your data integrity checks.

Watch the full video to get all the insights from the experts in this session, or find out more about Cutover for cloud disaster recovery.

Chloe Lovatt
Cloud disaster recovery
Latest blog posts