Just the other day, Facebook experienced a massive outage. This affected Facebook itself, WhatsApp, and Instagram. This outage lasted for several hours and many have been wondering what could have gone wrong. Was it an attack on Facebook’s services? It turns out it was nothing so sinister, but rather it was a mistake made during a routine maintenance.

This is according to a Facebook post by Santosh Janardhan, the company’s VP of infrastructure. According to the post, “During one of these routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally.”

Janardhan notes that Facebook actually has a system in place meant to audit these types of commands to specifically avoid issues like this, but it turns out a bug in the audit tool caused them to miss it. “Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command.”

As a result, “This change caused a complete disconnection of our server connections between our data centers and the internet. And that total loss of connection caused a second issue that made things worse.” Things seem to be up and running again, but for several hours, it left many users unable to access their messages and social networks.

Filed in General. Read more about , and . Source: engineering.fb

Discover more from Ubergizmo

Subscribe now to keep reading and get access to the full archive.

Continue reading