Slow Service, Downtime, & Internal 500 Errors

Posted on April 29, 2012 by Nick No Comments

Greetings;

Ashkir [dot] com is aware of the slow service, slow page loads, the small amount of downtime, and Internal 500 errors we had today. We’ve had over a hundred days without a problem, but, we’re not always lucky. The problem was caught early on and a fix is being worked on. Your websites are all working, and you are assured that no data was lost at all.

All cloud hypervisors have been rebooted. Stability issues are being worked upon.

Ashkir [dot] com would like to apologize for this issue. A full report will be added shortly.

Timeline of Events

Times are in -8 GMT
4:30 am Servers start suffering issues. Becoming increasingly noticable.
4:43 am The service we use has notified us that they are aware of this.
5:05 am The hosting service we use has a senior technician looking into this.
5:52 am The noc Techs have determined what the issue is.
6:06 am The techs are working on the issue, no updates at this time.
6:24 am All cloud hypervisors have been rebooted. Servers going offline.
10:00 am Services are back up, and have been up for a bit!
7:54 pm Noticing websites are being sluggish.
7:57 pm CPU usage is at 7.87 (normal)
8:05 pm CPU usage is at 10.54 (high)
8:07 pm CPU usage jumped to 13.07 (really high) page loads are around 12 seconds, greater than normal (.5).
8:15 pm CPU usage back down at 6.54 and load times are normal again.
8:32 pm CPU usage is jumping back up 9.18 right now.
8:36 pm Slow loads are being investigated.
8:37 pm Manageable again, but, still a bit sluggish. Load times are over 5 seconds for Americans. Higher than normal.
8:49 pm Everything is normal once again. Page generation is at 0.11, CPU is at 5.9
10:15 pm 9.46 CPU load. Looking into it. Sorry! Pageload is decent, but slower than normal.
1:10 am Database seems to be slow. Looking into it.
1:17 am Services stopped responding.
1:21 am Websites responding. CPU load is 23.2. page regeneration is 124.90164113 Database is at 110.2542832.
1:24 am CPU load down to 14.0 services more stable. I assure you this is being looked into.
1:34 am CPU load is at 6.23 which is a good normal number. Page generation time is at 0.993116855621 which is higher than normal, but still good. Database response time seems normal.
2:35 am BigWetFish has assured us that a technician is working on the issue.
2:47 normal load right now. Servers being worked on.
3:00 am all is well.
Please note, normal CPU load for us is around 2.0 or less, (8.0 is okay too). Page load times are normally less than 1.0 seconds.

After notes: It has been a long run, nearly 94 days of perfect uptime and stability. This was the first day we had issues. We hope being upfront with these issues do not betray your confidence in us. We apologize for this, it was a rare problem and it will likely not happen again. Let’s break the 94 days now. 🙂

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *