Last month we kicked off our first ever resilience-focused London meetup event. We were joined by resilience expert Mark Heywood, Associate Director of Operational Resilience Flora Kalhor, and Cutover SVP of Global Solutions Manish Patel, as well as a number of attendees working across multiple areas of resilience.
This post is a brief recap of what was discussed in that session.
Organizations are facing their biggest resilience challenges ever
Keynote speaker Flora Kalhor shared her journey from training as a chemical engineer to moving into business continuity and then operational resilience across multiple industries. She then spoke about the many challenges facing those working in financial services today and how these challenges are growing in both severity and frequency. She also explained why regulators are placing more emphasis on protecting customers from negative impacts as threats to organizational resilience increase. In the last five to six years, major organizations have been beset by a large number of incidents and this has led regulators like the FCA and PRA to be even more diligent and put in place more resilience requirements to keep customers safe. Although a necessary step, this creates an extra burden for organizations to ensure they’re compliant.
Flora provided the following advice for organizations when it comes to meeting regulatory requirements:
- Understand your business, know your services, and identify which are most important.
- Establish the impact tolerance timelines for each of your services - this should reflect how long they can be down before negatively impacting customers in a serious way.
- Plan for failure - although incident prevention is important, the landscape of risk has changed and regulators now recognize that all organizations will at some point be hit by a hack or have an incident and they need to be prepared for how they will respond to that eventuality.
- Carry out exercises to determine and improve your level of resilience - this should not be a tick-box exercise to satisfy regulators but a real test of if you can bring back services within your set timelines. This method will also lead to continuous improvement and better preparedness.
The third point, about the inevitability of organizations being affected by outages or cyber attacks, further reflects the need for community in this space. Regulators want to move organizations away from siloed thinking that separates cyber security, business continuity management, IT disaster recovery, etc., and facilitate more knowledge sharing and shared responsibility for the overall resilience of the organization.
Organizations fall at different stages of the resilience maturity curve and many currently lack a systematic approach to operational resilience and don’t have a sustainable management system that will lead to continuous improvement in the long term. Flora cited some of the most common mistakes organizations make, including:
- Testing only once when required, rather than continuously improving and building repeatability.
- Identifying services but not targeting them at the right level - the most critical systems require much more attention than others and the resources allocated should reflect this.
- A lack of simulation scenario exercises - desktop exercises are not a real challenge of your process or a true reflection of your ability to recover.
- Not focusing enough on third-party risk. (Uptime Institute found that, since 2016, 63% of all publicly-reported outages were caused by third-party and commercial IT operators such as cloud, hosting, colocation, and telecommunication providers.)
- A lack of alignment between different areas of resilience - business continuity, cyber security, etc.
- Not taking into account the different regulatory requirements across different countries - even if you are not operating directly in a certain geography, the regulatory requirements of that region may still touch on your processes and should be considered.
Communities build resilience
Although there are a lot of scary threats to resilience, sharing within the resilience community can help everyone be better protected. Fostering and nurturing talent is also a key part of this. Mark spoke about his unlikely journey into a career in resilience when he was hired as an actor to help organizations role play resilience scenarios to help plan their response.
Mark then recalled a number of times when the resilience community had come together to achieve something significant. A key example of a time when Mark was working in resilience and the whole community was brought together was the Y2K remediation project, for which he spent three years traveling the world patching servers to make them Y2K compliant.
“When I think about that time, I don’t really think about the compliance, or regulators, or the bank, I think about how I and everyone in the tech community worked so hard to ensure that everything worked - and because it worked it’s not an interesting story for the press but it’s a great example of the power of community.”
The talks were followed by a Q&A which sparked lots of great further discussion about the challenges of adopting new technologies, hiring the best talent, and making testing more reflective of reality. Thanks to everyone who came along to listen to the talks and sparked some great discussion afterward. We look forward to hopefully seeing some of you again at the next one as well as some new faces.