Disaster Preparedness: 3 Key Tactics for IT Managers


You can’t prepare for every ‘black swan’ event – consider the current supply chain disruptions impacting the holiday season and creating inflationary pressures. Even planned technology upgrades or simple configuration changes can have catastrophic consequences.

SkyWest recently reported in its quarterly results, this migration of critical systems to a newly built server in October resulted in a server failure. This computer glitch resulted in the cancellation of 1,700 flights, disruption for other major airlines and thousands of passengers, and a potential loss of $ 15-20 million.

By their nature, disasters – especially the black swan events brought on by the pandemic – are not easy to predict. But as an IT manager, you can better prepare for it and reduce the business impact by focusing on three key areas: enforcing change management controls, managing risk, and ensuring continuity governance. activities.

1. Apply change management controls

Change management controls are the subject of numerous audit findings for listed companies. It’s often easy to approach this from a ‘tick the box’ mindset just to appease internal and external auditors. Yet even a poorly managed, untested, or unauthorized change could have a negative and significant impact.

To mitigate potential Internet outages due to configuration changes, system changes should include the appropriate assessment, planning, testing, approval, documentation, automation, and communication strategy. Fully test all changes before putting them into production, and be careful not to hamper the pace of innovation: resizing risk assessment and impact testing to suit culture, industry and your company’s risk appetite.

Change management controls should be a part of any software development or configuration process, whether you are using a waterfall, agile, or DevOps. This includes appropriate segregation of duties (SOD) controls, which apply in “glass breaker” emergencies. Developers may need emergency access to a production environment where they typically do not have access rights.

Many cloud providers provide status page reports on platform outages. Make sure your teams subscribe to these status pages and that all contracts include appropriate clauses specifying that vendors will notify teams of any planned upgrades or major issues in a timely manner.

[ How can the DevSecOps approach help? Get a shareable primer: What is DevSecOps? ]

2. Conduct risk assessments and business impact analyzes

Whether it is social media breakdown or airline reservation systems that were not accessible due to a service failure, risk assessments – both internal and external involving key technology vendors – can help identify risks before they materialize in disasters. A risk assessment is part of a risk management program that identifies threats and vulnerabilities in assets used to achieve business objectives.

Have your team determine the likelihood of a risk occurring and the potential impact on the business if a risk does occur, keeping in mind resource, time, and budget limitations.

Have your team determine the likelihood of a risk occurring and the potential impact on the business if a risk does occur, keeping in mind resource, time, and budget limitations. Business impacts can include financial, reputation / brand, customer, legal / regulatory and operational impact categories.

Once the risks have been identified and the impacts assessed and noted, implement an appropriate response to the risks. This includes risk treatment options to accept the risk, mitigate the risk with new or existing controls, transfer the risk to third parties – often with insurance or risk sharing, or avoid the risk by going out of business. which is linked to it.

A risk assessment can be combined with a Business Impact Analysis (BIA) which provides information on business continuity and disaster planning. A BIA identifies recovery time objectives (RTOs), recovery point objectives (RPOs), critical processes, dependency on critical systems, and many other areas. It comes to the 80/20 rule where rather than creating costly recovery strategies for 100 percent of all critical business functions, you want to focus on the 20 percent of business processes that are most critical and need to be recovered. quickly in the event of a claim. an event.

Once a BIA is complete, organizations can determine their recovery strategies to maintain business continuity during a disaster. Business continuity plans should be based on the BIA and updated at least once a year. Disaster recovery plans to recover applications, services, and data centers should be documented, tested, and maintained.

[ Strong leadership is essential during challenging times. Read also: 4 IT leadership tips for turbulent times. ]

3. Establish governance for business continuity management and crisis communications

Finally, establish the appropriate governance for business continuity management (BCM). Tone at the top matters when it comes to emphasizing organizational structure, roles and responsibilities, policies and funding SCM initiatives.

Governance includes the involvement of appropriate stakeholders in BCM and clear planning for crisis management. Crisis management should include crisis planning, crisis response and crisis communication. Think about what information employees, board members, customers, suppliers, and the media should know, and designate the appropriate spokespersons to address the issue.

It is also a good idea to prepare statements in advance which can be revised as updates and facts arise – this is how companies can maintain transparency and honesty without causing any disruption. unnecessary alarm.

As much as you can try, no one can prepare – or predict – everything. COVID-19 is a black swan event of unprecedented magnitude, global scale and impact. However, by focusing on change management, risk management, and governance, you can help your businesses better prepare for the next major disaster.

[ How do containers and Kubernetes help manage risk? Read also: A layered approach to container and Kubernetes security. ]