Many of us could not imagine a world without social media platforms like Facebook, WhatsApp or Instagram.
However, this became a reality for billions of users on 4 October 2021, when Facebook and its subsidiaries became suddenly and globally unavailable for more than five hours.
People worldwide depend on social networks for everything from conducting business to staying connected with loved ones. So, this widescale outage has prompted organisations to wonder how this could have happened — and the implications for their own business, should they experience a similar fault.
The domino effect
So, what was the catalyst for Facebook’s downtime? Were there malicious actors involved? Was it the result of an attempt to steal user data?
On the contrary, Facebook released a statement explaining that it was something far less sinister. Configuration changes on the background routers that coordinate network traffic between data centres caused issues that interrupted this communication. And as Facebook runs all its services through Facebook, this had a cascading effect on its other services.
Companies such as Facebook use Border Gateway Protocol (BGP) to advertise the location of their data centres to the internet. Internet routers need this information to request access to relevant servers, so a faulty configuration change to this system is what caused routers to conclude that Facebook’s data centres simply did not exist, rendering its various apps and services unusable.
The outage lasted so long because the network that went down was the same one that staff needed to access the network and fix the issue remotely. On top of this, it also took out Facebook Workplace (an online collaborative software tool) and third-party communication apps. The fault reportedly prevented staff from physically accessing its data centres, as their site access cards depended on functioning internal systems.
Not only did this impact the businesses and individuals who rely on Facebook’s network of social media products, but it also had significant financial consequences for Facebook itself. Founder Mark Zuckerberg’s personal fortune was diminished by $7 billion (almost £5.1 billion), and the company lost more than $13 million (nearly £9.5 million) in advertising revenue every hour it was out of action.
A wake-up call
You may be thinking that if something like this can happen to a digital empire like Facebook, what is stopping it from happening to my business?
In short, the answer is nothing. Without the proper contingencies in place, this unfortunate scenario could befall any organisation. So, here are a few things you should take away from this event to prevent such a disaster from occurring…
Decentralise network control
Having your business’ information stored on one centralised system might seem like the most straightforward approach, but too many mutually dependent systems in a network could facilitate a Facebook-scale shutdown. Instead, decentralising control by migrating to cloud architecture will ensure data is distributed and remotely accessible, preventing a fault at one data centre from impacting other networks.
Mitigate against human error
No matter how hard IT professionals work to manage various risks, mistakes will sometimes happen. Human error is a leading cause of system downtime, and phishing attacks remain the most common cyber threat to businesses. So, no matter what size, every company must enforce training and policies to address this issue and prevent a breach from disrupting uptime. By automating risk assessment and threat modelling with artificial intelligence (AI) or machine learning (ML) and promoting company-wide cyber awareness, business leaders can reduce the likelihood of human error causing a fault.
Invest in the necessary technology
In the modern world, businesses and cyber security teams must have the appropriate software and technology to keep pace with rapid digitisation. Physical data centre infrastructure will always be more vulnerable than cloud-based systems, as hardware can fail unexpectedly. Embracing the latest IoT-enabled technology and upgrading to remote servers will help automate routine processes, prevent system and equipment failures and improve the robustness of cyber security systems.