Friday Sept 16 NZT Site/Beta Outage - Now with Incident report

3.5 hours ago, the Path of Exile site and Beta realm became unable to retrieve information from one of our database servers. This led to people being unable to log in to their accounts on either the site or the game.

This has now been fixed and will be investigated fully.
Lead Developer. Follow us on: Twitter | YouTube | Facebook | Contact Support if you need help!
Last edited by Chris on Oct 31, 2012, 5:46:19 AM
I blame the Earthquakes.
"
IZombie wrote:
I blame the Earthquakes.
I blame aliens, dracula and the flying spaghetti monster. I believe they teamed up against us. This must be investigated, get the Belmonts, Mulder&Scully and Richard Dawkins!
Mew~
Pointless. Aliens have the ultimate coverup ability in the U.S government.





Need I say more.
Cheers Chris for the update, and gl finding the prob.
I wouldn't be here but I was out voted 3 to 1 by the other personalities.
"
Chris wrote:
3.5 hours ago, the Path of Exile site and Beta realm became unable to retrieve information from one of our database servers. This led to people being unable to log in to their accounts on either the site or the game.

This has now been fixed and will be investigated fully.


hope it wasn't lulzsec anti sec anon or some other hacking group :\
Curios.I hope its nothing serious and that it doesnt cause you too much trouble. Good Luck! :D
Here is this incident report for this issue.

This issue was triggered during our automated backup of the game database. We are not absolutely sure what happened yet but the backup caused the database software to panic.

Normally this would lead to the server aborting all open transactions, restarting the database and then retrying all the transactions once more, continuing where it left off. Unfortunately the DB recovery code was not well tested enough and the server managed to get into a state in which it thought it was recovered, but yet every transaction returned a database error.

While our monitoring did detect the issue, nobody was awake to see problem and attempt to fix the issue.

When I woke up this morning I restarted the realm and this brought everything back up.

Here are the steps we will be taking in response to this issue.

1) Investigate the cause of the original issue and attempt to fix it. We will be asking the developers of our database software questions about what occurred. This should prevent this specific issue happening again.

2) Fix the code that recovers the database after a failure. This should prevent problems occurring with the database in the future.

3) Investigate our options for adding support for sending an SMS message to our developers so that if something goes wrong during the middle of the night we can investigate it immediately.

4) Potentially add an automatic restart to our monitoring so that if the realm continues to be in bad state for more than 5 minutes, automatically restart it. This is something that we have been considering but are reluctant to do because we might end up causing more problems if there are false positives.

While we are in beta, server problems are expected, but longer term downtime is not acceptable. We apologise for this issue and hopefully with these steps it will not happen again!
Path of Exile II - Game Director
Auto reboots are quite tricky: I have a small computing cluster in my basement, and I was setting up watchdog to reboot the machine if it stopped processing data for me: I managed to find at least two cases where I was getting false positives that put me in a reboot loop. As a game server, you have the problem of losing state, AKA player positions and dumping them in town, so that is even more tricky.

In the end, my advice is just to be really careful with auto reboot. SMS sending is usually pretty easy, depending on carrier, a lot (here in NA, anyways) support 5555551234@mobile.carrier.net type emails that will send a short SMS. If not, there are SMS gateways that will be happy to take your money... Or, if everyone is using smart phones, set email to notify on arrival and just drop an email on downtime (my eventual solution to my problems here)
Last edited by namespace on Mar 11, 2021, 6:06:55 PM
Thanks heaps for the update. I would just like to say that level of dedication is huge. I was up and playing when it happened. I really appreciate the level of commitment and the quality updates pertaining to your game software. Its great to see that you guys are so open. Thanks heaps for your effort, you make a brilliant game and its very satisfying to be able to help you guys test it.

Report Forum Post

Report Account:

Report Type

Additional Info