h2g2 Episode 14: Keeping the lights on

A Hitch-Hacker’s Guide to the Galaxy – Developing a Cyber Security Roadmap for Executive Leaders

In this blog series, I am looking at steps that your organisation can take to build a roadmap for navigating the complex world of cyber security and improving your cyber security posture.

There’s plenty of technical advice out there for helping security and IT teams who are responsible for delivering this for their organisations. Where this advice is lacking is for executive leaders who may or may not have technical backgrounds but are responsible for managing the risk to their organisations and have to make key decisions to ensure they are protected.

This blog series aims to meet that need, and provide you with some tools to create a roadmap for your organisation to follow to deliver cyber security assurance.

Each post focuses on one aspect to consider in your planning, and each forms a part of the Cyber Security Assessment service which we offer to our member organisations in the UK Higher and Further Education sector, as well as customers within Local Government, Multi-Academy Trusts, Independent Schools and public and private Research and Innovation. To find out more about this service, please contact your Relationship Manager, or contact us directly using the link above.

View all episodes.

Episode 14: Keeping the lights on

“Funny,” he intoned funereally, “how just when you think life can’t possibly get any worse it suddenly does.”

Douglas Adams, A Hitchhiker’s Guide to the Galaxy

[ Reading time: 11 minutes ]

Key Acronyms to Remember

BC: Business Continuity
BIA: Business Impact Assessment
MTD: Maximum Tolerable Downtime
DR: Disaster Recovery (see episodes 11 and 12, “Be the master of disaster”)
IR: Incident Response (see episode 13, “Action stations!”)

Each plays a distinct but interconnected role in ensuring your organisation’s resilience.

When disaster strikes

In the last episode (“Action stations!”) I looked at how to put in place an effective incident response plan. When you’re in the middle of a cyber incident, you need to ensure that your key business functions can continue operating until you are able to get back to business as usual. We know that it can take up to 4 weeks or more to recover from a major cyber incident. With your IT systems out of action for that sort of time period, what is going to keep your organisation afloat is your Business Continuity (BC) planning.

BC planning isn’t rocket science, but is often overlooked or under-developed, and sometimes organisations only pay lip service to it, with a high-level policy which leaves the details to be addressed somewhere else.

When IT systems are out of action, it’s sometimes the IT team who people look to provide an alternative solution for keeping business functions running. That is emphatically not their responsibility, especially during a major incident, when all their time and energy needs to be focussed on recovery.

Business Continuity planning is the responsibility of the leaders of business functions. When asking these leaders to develop BC plans, they’ll need to know what they are planning for. What sort of scenarios (e.g. fire, flood, bomb, pandemic, power or IT outage) and for how long (4 hours, 2 days, 3 weeks, or more).

Finding where it hurts

You’ll need to conduct a Business Impact Assessment (BIA) to determine the impact of disruptions to critical functions, including potential risks to personal safety, property, income and reputation. A key measure to evaluate for each function is its maximum tolerable downtime (MTD). This is the amount of time that it can continue to perform on a contingency basis—specifically IT outage—before it ceases to be operational.

In universities, during clearing the MTD might be as little as 30 minutes. At other times, for some services, you might manage for a week or more. For some research activities, the MTD might be a matter of seconds. The MTD figure is a useful measure of criticality of systems and should be recorded in your organisation’s information asset register, where it is used to determine the recovery priorities for systems in the Disaster Recovery (DR) plan.

The way to develop your BC planning is to scope out a set of realistic timeframes for consideration, based on records of past incidents affecting the sector or other businesses. What if the recovery time takes 1 hour, 4 hours, 2 days, 10 days, 3 weeks, or 3 months? It is incumbent on the owner of each business function to work out how they would continue to operate in any of these time windows if IT is not restored. It might mean thinking out of the box, perhaps a return to manual pen and paper systems.

Like the incident response plans we looked at in the last episode, these BC plans need to be tested in peace time to ensure that they are fit for purpose when needed.

Join the dots

A valuable output from a well-directed BC planning exercise is a clear understanding of which business processes are absolutely critical to keeping the organisational show on the road, and how these relate to other systems or parts of the business. For example, you might discover that your payment processes rely on internet access to authorise payments, or that the student records system needs Active Directory to authenticate user logins. A BC planning and testing programme should uncover all these dependencies, which in turn will help inform your recovery procedures, priorities and timescales.

Your first task is to determine those business processes which are most critical, which are vulnerable and to what type of events, and what the potential impacts are if these processes are disrupted for a day, a week, or longer. Those systems and impacts might vary depending on the time of year, such as clearing, enrolment, exams, payroll pay day and financial year end.

Roll up your sleeves

Start with a high-level strategic BC plan which outlines the scenarios which might disrupt the normal working of your organisation, and the high-level responses required of each business function to each of these.

Identify those incidents which are likely to cause sufficient disruption to exceed the maximum tolerable downtime (MTD) for each business function. Your MTD could be 30 minutes at clearing, 2 hours on payroll pay day, 24 hours for exams. Disruption events include power outage, fire, flood, system downtime, terrorist threat, protest or cyber attack. Don’t forget your supply chain—you may be reliant on third-party systems or services which could affect your business processes if they went down.

For each of these scenarios, you need to identify the affected business functions and how you expect each to perform. This could mean deferring or cancelling a process (for example, postponing exams or coursework submission dates) or adopting a “plan B” to keep systems running.

In each case, various teams will be required to deliver the outcome. You might need Student Services to manage communications to students about revised assessment arrangements, or the Admissions team to revert to a manual paper-based process for clearing.

Each business function will need to maintain its own operational BC plan which details how it will deliver continuity for each scenario. This needs to align with the strategic BC plan and all plans should be collated, accessible when required, and reviewed annually.

Contact with the enemy

Devising your BC plans is necessary but not sufficient to ensure preparedness. Testing out the plans is a critical component for resilience.

First, you need to test out the plans to show whether or how well they will work. It will help identify gaps in the plan, and areas for improvement. It also helps build the “muscle memory” which is needed to respond as quickly and effectively as possible when you need to put plans into action. The military adage is “no plan survives contact with the enemy”, so don’t be surprised if there are lessons to be learned and improvements to be made.

You should aim to conduct testing of your incident response and business continuity plans every 12 months, using a different scenario each time. Use the lessons learned from these dry runs to update your plans.

A final [deep] thought

In the next episode of A Hitch-Hacker’s Guide to the Galaxy, we’ll be looking at the human side of cyber security. Don’t Train in Vain.

For now, you can take useful steps forward by checking out your organisation’s business continuity planning. Do you have a BC plan? When was it last updated? Does it contain all the elements required? Does it integrate with an Incident Response plan? When was it last tested? Do you know what your critical a business functions are, and what the maximum tolerable downtime is for each?

James Bisset is a Cyber Security Specialist at Jisc. He has over 25 years experience working in IT leadership and management in the UK education sector. He is a Certified Information Systems Security Professional, Certified Cloud Security Professional, Certified in Information Systems Risk and Control and is a member of the GIAC Advisory Board.