Thursday 18th February 2016

Networking Infrastructure Yesterday's Outage

Yesterday, we experienced an outage which affected a subset of Cloud-A customers' instances. Through our investigation, we have confirmed that the cause was a failing core switch on our VM network. This network does have redundant switching, however the type of "soft" failure that occurred left the switch in a partially operating state. The state prevented some nodes from realizing that the switch was not properly sending all packets to their proper destinations, therefore causing disconnections on our virtual network layer. We are currently working with our vendor to determine a proper fix for this type of failure going forward. In the meantime, this switch has been removed from our core VM network.