General Discussion - Incident Report for Feb 7 Downtime - Forum

At 15:55NZT today we experienced a sustained outage of all Path of Exile service for 1 hour and 4 minutes until 16:59NZT. While downtime can happen from time to time, extended downtime of this nature is not acceptable. By 15:56NZT our server admin Thomas was alerted to the problem and began attempting to diagnose it. At that time all Path of Exile servers, including hot spares in our server host's Dallas data centre were unreachable. This data centre is where Path of Exile's core infrastructure is hosted. We notified our server host about the problem. At 16:19NZT we were notified that there had been a power event in one of the server rooms of that data center. All servers and network infrastructure in that server room had lost power. At 16:20NZT power was restored and equipment began to power up again. At 16:51NZT all of the network infrastructure was back up and our servers were available again. Unfortunately several of our servers did not come back up and one of those servers was a primary database server. At this time we decided to initiate failover of this server to the hot spare. By 16:57NZT we had prepared the necessary config changes to move to the hot spare. At this time we begin starting the realm again. At 16:59NZT the realm was back up and functioning again normally. While this downtime was caused by a power incident outside our control, our ability to recover from such events is our responsibility. For an incident like this, we should require at most 5 minutes of downtime while we move to redundant infrastructure. This incident is our fault because we did not take sufficient steps to isolate our redundant infrastructure. While we had hot spares, we didn't take care to ensure that those hot spares were actually powered by separate power systems. In addition, it took longer than it should have once our failover sequence began to actually make the switch. Over the next few days we will be taking steps to move redundant infrastructure to different locations so that they are completely isolated from failure. We will also be working our procedures for failover so that they can be performed faster and with less steps required. I'm personally very sorry. As the Technical Director of Grinding Gear Games it is my responsibility to ensure that our infrastructure is sufficiently redundant so that when disasters happen, we have the minimum possible disturbance to our users. This is not the level of service that you should expect from our company, and I can assure you that we will be making changes so that an incident like this will not happen again. Path of Exile II - Game Director	Posted by Jonathan on Feb 6, 2014, 11:58:09 PM Grinding Gear Games Quote this Post
Dayum yo fucking blizzard doesn't even do this. Jonathan It was two hours it wasn't a big deal xD fuck 8 hours wouldn't be a big deal you guys LITERALLY have no down time 99% of the time that's good enough for me lol Dys an sohm Rohs an kyn Sahl djahs afah Mah morn narr Last edited by Coconutdoggy#1805 on Feb 6, 2014, 11:59:29 PM	Posted by Coconutdoggy#1805 on Feb 6, 2014, 11:58:57 PM On Probation Quote this Post
An event out of their control, yet the technical director himself personally apologises. This is why you get my money GGG. "Minions of your minions are your minion's minions, not your minions." - Mark	Posted by ciknay#1000 on Feb 7, 2014, 12:00:55 AM Quote this Post
" ciknay wrote: An event out of their control, yet the technical director himself personally apologises. This is why you get my money GGG.	Posted by PoofGoof#3481 on Feb 7, 2014, 12:04:08 AM Quote this Post
Always a class act. Appreciate you taking the time to write down this post :)	Posted by kasub#2910 on Feb 7, 2014, 12:06:57 AM Quote this Post
I wish an hour outage at my company was responded to in such a manner! IGN: Kulde	Posted by Yxalitis#6223 on Feb 7, 2014, 12:08:15 AM Quote this Post
" ciknay wrote: An event out of their control, yet the technical director himself personally apologises. This is why you get my money GGG. Yep	Posted by phazius#6765 on Feb 7, 2014, 12:08:20 AM Quote this Post
This is why you have, and will continue to get my money. Thank You.	Posted by djnecro#5304 on Feb 7, 2014, 12:08:30 AM Quote this Post
" ciknay wrote: This is why you get my money GGG.	Posted by Bunnu#1083 on Feb 7, 2014, 12:10:56 AM Quote this Post
+1 mate. Stuff happens. I love pie.	Posted by xSyK0TiC#2841 on Feb 7, 2014, 12:11:44 AM Quote this Post