Data center outage.

Posted on May 29, 2012 by Nick No Comments

We’ve had a major outage today and have been unable to offer updates due to the entire datacenter basically disappearing. Our fantastic host, BigWetFish has stayed on top of this from the very beginning. Here’s a timeline in -8 (Pacific) GMT.

UPDATE

The outrage has made the news at The Register. Click here for the article.

Gist: There was a major power outage at the Pulsant / Bluesquare datacenter in Maidenhead, United Kingdom. This was not the fault of Ashkir [dot] com nor BigWetFish. And we both apologize for these issues.

Who was affected? All of the Ashkir [dot] com network was affected as well as several webhosts and businesses throughout the world by this power outage. This has caused a great delay in emails being sent and has disrupted many hours of business and charity work, in fact, a majority of the Californian workday was affected by this causing emails and talking to offices, personal, and businesses hosted on this network. I apologize profoundly about this and would like to assure you that none of the email were lost.

Actions: Actions and measures are being reviewed and being put into place to prevent this from happening again in the future. It was an unforeseen issue. Personally, I plan on moving to an VPS as funds become more abundant to allow us a greater stability and a future in the web business.

What are we going to do: Make sure everything is running smoothly and nothing was lost. So far so good! I do want to do an eventual move to a VPS for greater stability for us. However, funds to do so aren’t available right now. This will probably happen around 2013 when it’s time for my hosting plan to renew. :). Thanks for the patience, and once again I apologize. I am updating all support tickets now.

The Timeline:
All times -8 GMT
1:51 am (9:51 am GMT) Lost connectivity to the internet. All packet loss. (tweet)
2:11 am (10:11 am GMT) Internet connectivity has been restored. (tweet)
3:12 am (11:12 am GMT) Downtime totaled about twenty minutes. Update is being looked into by BigWetFish. (BWF)
10:53 am (6:53 pm GMT) The issue earlier was resolved to be a routing issue affecting all servers inside the data center. Backup systems were brought on line quickly to rectify the issue. Engineers are working to restoring everything as soon as possible.
2:27 pm (10:27 pm GMT) Issue has arised again in the data center. BigWetFish is in communication with their partners to get to the bottom of this. It is not affecting BigWetFish or Ashkir [dot] com alone. (tweet)
2:41 pm (10:41 pm GMT) Hosting service, Tsohost in the same data center as BigWetFish released an update about three of their Maidenhead racks were without power. (tweet)
2:59 pm (10:59 pm GMT) BigWetFish’s Control Panel Server for their OnApp Cluster has returned online. The SAN and Hypervisors are back online. They’re starting up the VM’s as of now. Some services may need to fsck (file check). (BWF)
3:29 pm (11:29 pm GMT) All cluster servers on the cloud have returned online. Hypervisor load will be heavy for a short while. (BWF)
3:35 pm (11:35 pm GMT) Server 26 is back online. (tweet)
3:38 pm (11:38 pm GMT) Server 26 (our cloud) was reported as one of the affected by the outage. (tweet)
4:00 pm (2:00 am GMT) Everything’s stable. Load is within acceptable ranges. Though, tad slow.
4:17 pm (2:17 am GMT) Load in acceptable range, load is faster.
5:20 pm (1:38 am GMT) Hypervisors are stable everything is going well. Email services are now caught up.
June 1st 2:01 am (6:01 am GMT May 31st) News article is up.
I would like to thank BigWetFish, my host, for staying on top of this issue at all times and for giving me updates!

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *