AWS Outage Affects Hundreds of Enterprise Services

Amazon Web Services Inc. (AWS) on Friday experienced at outage in its AWS-East Region (Ashburn) that reportedly affected hundreds of enterprise services.

"There was a loss of power to one of AWS's redundant Internet connection points in Virginia that created connectivity issues for a small number of AWS customers who are using Direct Connect services in AWS US East," an Amazon spokesperson said. "AWS resolved the issue and is working with its partner to prevent recurrence."

While that "partner" was unnamed, Internet monitoring company ThousandEyes reported the outage affected hundreds of critical enterprise services including Atlassian, Slack and Twilio. The outage even silenced Alexa, the AWS personal assistant that works with Amazon Echo speakers, according to the Downdetector site.

The AWS Service Health Dashboard provided this timeline of the outage:

7:29 AM PST We are investigating increased packet loss possibly impacting some AWS Direct Connect customers in the US-EAST-1 Region.

8:03 AM PST We continue to work towards resolving the increased packet loss impacting AWS Direct Connect connectivity to the US-EAST-1 Region. Direct Connect connections from the Equinix DC1 - DC6 & DC10 - DC12, Ashburn, VA and CoreSite VA1 & VA2, Reston, VA locations are affected by this issue.

9:43 AM PST Connections in the CoreSite VA1 & VA2 (Reston, VA) location, and some connections in the Equinix DC1 - DC6 & DC10 - DC12 (Ashburn, VA) location are inactive. Inactive connections are not receiving routes advertised from Direct Connect routers. We are now working to restore service on these Direct Connect connections. The AWS VPN service is operating normally and may be an alternative for some workloads.

11:21 AM PST Between 6:23 AM and 10:26 AM PST on March 2, 2018, some customers in the CoreSite VA1 & VA2 (Reston, VA) Direct Connect location and the Equinix DC1 - DC6 & DC10 - DC12 (Ashburn, VA) Direct Connect location experienced a connection loss to the US-EAST-1 Region. The issue has been resolved and the service is operating normally.

The company says the AWS Direct Connect service "can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections."

In another bit of irony, the outage occurred almost exactly one year after a widely reported outage that bogged down much of the Internet, blamed on an incorrectly typed command.

ThousandEyes said Friday's outage "is a hard-hitting reminder of the vulnerability of the cloud. Enterprises are very quickly adopting Cloud First strategies by moving workloads to IaaS providers like AWS. However, many organizations still do not fully comprehend the unpredictable dependencies that go along with this shift."

The company said the outage affected more than 240 critical services relying on AWS infrastructure.

"This episode serves as a powerful reminder that the cloud is a complex interconnected system," ThousandEyes said. "Outages and natural disasters in one part of the cloud can quickly ripple over into other areas. Cloud vendors offer several ways to directly connect into their infrastructure. However, they do not make you immune from the external dependencies of the Internet. While availability zones offer some level of redundancy, regional outages like these can quickly envelope entire clusters of data centers."

To mitigate the effects of such outages, the network monitoring specialist recommended ensuring geographical redundancy as a key part of an organization's fault tolerance strategy -- along with employing network monitoring services.

About the Author

David Ramel is an editor and writer for Converge360.


Subscribe on YouTube