Random Disconnects on All Realms (WinMRT Results Inside)

"
SlippyCheeze wrote:

All I can really say is that the first poster had no signs of network problems (eg: they should talk to GGG), and yours shows signs of packet loss starting inside your network and carrying on.


From the OP:
"

| po1.fcr01.sr01.ams01.networklayer.com - 4 | 357 | 346 | 34 | 39 | 277 | 34 |


Now I'm not an expert at looking through WinMTR logs, but I would say that line struck my eye as he seems to have packetloss exactly at the same hop as I did. Or what does the first number (4) in this case mean? Mind you, you can play the game uninterrupted for minutes to half an hour, and then you get a sudden freeze (most likely when the packet loss occurs.) and nothing but exiting the game helps at that point.
IGN: ihaveseenthefail
"
jormaerik wrote:

From the OP:
"

| po1.fcr01.sr01.ams01.networklayer.com - 4 | 357 | 346 | 34 | 39 | 277 | 34 |


Now I'm not an expert at looking through WinMTR logs, but I would say that line struck my eye as he seems to have packetloss exactly at the same hop as I did. Or what does the first number (4) in this case mean? Mind you, you can play the game uninterrupted for minutes to half an hour, and then you get a sudden freeze (most likely when the packet loss occurs.) and nothing but exiting the game helps at that point.


Up front: you turn out to be wrong, but your assumptions are very reasonable, and sensible. Also, let me know if I over-explain or under-explain any of this, because it is hard to get the exact level right when I don't know you well. :)

So, the way that this diagnostic is done is that a set of packets are set out with a set of incrementing "TTL", or "time to live", fields. When you forward a packet to another host, you decrement the TTL, and when it hits zero, you throw away the packet. You also send back an error message to the original host, telling them about it.

WinMTR basically counts "packets sent with TTL N, and where the error messages come from", and uses that to build the list of hops - and to estimate packet loss, by counting the number of "missing" error messages.

Some routers, meanwhile, will throw away some or all of the TTL expired packets but *not* send back an error message. It takes time, and CPU, to generate the messages, and we "know" that they are almost always generated either in a serious error (where the message won't help much) or because someone is doing a WinMTR style test.

Because of that, we sometimes see a trace that looks like this:

| ae6.dar01.sr01.ams01.networklayer.com - 0 | 400 | 400 | 33 | 33 | 53 | 53 |
| po1.fcr01.sr01.ams01.networklayer.com - 4 | 357 | 346 | 34 | 39 | 277 | 34 |
| f0.4a.3a25.ip4.static.sl-reverse.com - 0 | 400 | 400 | 34 | 34 | 37 | 34 |

That is, your cited hop above, but with the one before and one after, for context.

Now, when we see no packet loss at the hop before, a sudden jump on a hop, and then the packet loss go back down to zero afterwards we can assume - and it is just an assumption, like the rest of this - that it is a router behaving as described above. It isn't responding to all packets sent to itself, but it passes every single packet for the next hop through correctly.

If it wasn't passing them through we would see the next hop is (statistically) the same or greater level of packet loss, because both packets "to" po1, and packets going "past" po1, would be equally hit by the drops. (packet loss drops are statistically random, so they should hit everything equally, in the common case. very occasionally you will see a slightly lower number of drops at a host after the packet loss, just because statistics and low sample counts, but they should follow the general rule.)

So: po1 looks like a busy router, which doesn't respond to all TTL expirations with ICMP error messages. This will be consistent across tests, because it is a property of the individual machine, but it also consistently returns to the standard level of packet loss at the next hop.

For the OP that was a return to zero, for you, a return to 0.4 percent packet loss. So, we can say that the traces show strong evidence that:

A) po1 is a router that discards some ICMP errors instead of sending them out, which is "vaguely wicked" according to the standards that define the Internet, but not actually real-world significant.

B) the original trace shows no evidence of network path related packet loss or latency jumps during the period sampled, so it *probably* isn't caused by network issues in their case. (ie: GGG should investigate further for latency within their servers.)

C) your trace shows some packet loss, and while 0.4 percent sounds very small, it is enough that it can cause notable problems, especially if they are clustered together in time. (eg: you lost 0.4 percent of 400 packets or whatever, but you lost them all in the same chunk of time, so it delayed stuff more significantly.)

One reason I personally recommend PingPlotter vs WinMTR to people, incidentally, is that it shows a time-oriented graph with packet loss and latency, which can make that time relationship very obvious, while WinMTR just gives us the "statistically, over the whole time tested" results.
Last edited by SlippyCheeze#7036 on Apr 9, 2018, 12:36:49 PM
"
SlippyCheeze wrote:
"
jormaerik wrote:

From the OP:
"

| po1.fcr01.sr01.ams01.networklayer.com - 4 | 357 | 346 | 34 | 39 | 277 | 34 |


Now I'm not an expert at looking through WinMTR logs, but I would say that line struck my eye as he seems to have packetloss exactly at the same hop as I did. Or what does the first number (4) in this case mean? Mind you, you can play the game uninterrupted for minutes to half an hour, and then you get a sudden freeze (most likely when the packet loss occurs.) and nothing but exiting the game helps at that point.


Up front: you turn out to be wrong, but your assumptions are very reasonable, and sensible. Also, let me know if I over-explain or under-explain any of this, because it is hard to get the exact level right when I don't know you well. :)

So, the way that this diagnostic is done is that a set of packets are set out with a set of incrementing "TTL", or "time to live", fields. When you forward a packet to another host, you decrement the TTL, and when it hits zero, you throw away the packet. You also send back an error message to the original host, telling them about it.

WinMTR basically counts "packets sent with TTL N, and where the error messages come from", and uses that to build the list of hops - and to estimate packet loss, by counting the number of "missing" error messages.

Some routers, meanwhile, will throw away some or all of the TTL expired packets but *not* send back an error message. It takes time, and CPU, to generate the messages, and we "know" that they are almost always generated either in a serious error (where the message won't help much) or because someone is doing a WinMTR style test.

Because of that, we sometimes see a trace that looks like this:

| ae6.dar01.sr01.ams01.networklayer.com - 0 | 400 | 400 | 33 | 33 | 53 | 53 |
| po1.fcr01.sr01.ams01.networklayer.com - 4 | 357 | 346 | 34 | 39 | 277 | 34 |
| f0.4a.3a25.ip4.static.sl-reverse.com - 0 | 400 | 400 | 34 | 34 | 37 | 34 |

That is, your cited hop above, but with the one before and one after, for context.

Now, when we see no packet loss at the hop before, a sudden jump on a hop, and then the packet loss go back down to zero afterwards we can assume - and it is just an assumption, like the rest of this - that it is a router behaving as described above. It isn't responding to all packets sent to itself, but it passes every single packet for the next hop through correctly.

If it wasn't passing them through we would see the next hop is (statistically) the same or greater level of packet loss, because both packets "to" po1, and packets going "past" po1, would be equally hit by the drops. (packet loss drops are statistically random, so they should hit everything equally, in the common case. very occasionally you will see a slightly lower number of drops at a host after the packet loss, just because statistics and low sample counts, but they should follow the general rule.)

So: po1 looks like a busy router, which doesn't respond to all TTL expirations with ICMP error messages. This will be consistent across tests, because it is a property of the individual machine, but it also consistently returns to the standard level of packet loss at the next hop.

For the OP that was a return to zero, for you, a return to 0.4 percent packet loss. So, we can say that the traces show strong evidence that:

A) po1 is a router that discards some ICMP errors instead of sending them out, which is "vaguely wicked" according to the standards that define the Internet, but not actually real-world significant.

B) the original trace shows no evidence of network path related packet loss or latency jumps during the period sampled, so it *probably* isn't caused by network issues in their case. (ie: GGG should investigate further for latency within their servers.)

C) your trace shows some packet loss, and while 0.4 percent sounds very small, it is enough that it can cause notable problems, especially if they are clustered together in time. (eg: you lost 0.4 percent of 400 packets or whatever, but you lost them all in the same chunk of time, so it delayed stuff more significantly.)

One reason I personally recommend PingPlotter vs WinMTR to people, incidentally, is that it shows a time-oriented graph with packet loss and latency, which can make that time relationship very obvious, while WinMTR just gives us the "statistically, over the whole time tested" results.



Most excellent reply! Thank you for taking the time and explaining how it works; it's quite evident I haven't really looked into how (WinMTR) actually operates.

Now this would explain a lot, here's my theory: what we're most likely seeing here is a malfunctioning router, exactly at the hop hilighted by WinMTR. In-game it behaves exactly like what would happen if you'd sever TCP connection while the game was running. Everything just freezes, all menus etc. are completely unresponsive, the only thing that works is "Exit path of exile" button from the ESC menu.

So my hypothesis is that the malfunctioning router is actually dropping all connections occasionally (and hence the ICMP resonpse isn't received by the requesting end either.) At least it would match both symptoms above.

It's also worth mentioning the 0.4% packet loss is a single lost packet in 228-229 packets sent. I'm suspecting it's my cable modem that's doing it, but it hasn't really negatively affected my networking for more than half a decade now. I'll most likely get it replaced quite soon. Also it's worth mentioning I'm using wired connection (Windows gaming box -> Switch -> ClearOS [this is also where I was running the mtr from] -> Cable Modem.)

Now another snapsnot of the MTR:

My traceroute [v0.85]
gateway.systema.localnetwork (0.0.0.0) Mon Apr 9 19:33:53 2018
Keys: Help Display mode Restart statistics Order of fields quit
Packets Pings
Host Loss% Snt Last Avg Best Wrst StDev
1. [ redacted ].dnainternet.fi 0.0% 106 13.1 14.0 8.3 71.6 8.7
2. hel5-sr3.dnaip.fi 0.0% 105 8.5 11.8 8.3 27.4 3.8
3. hel5-tr3.dnaip.fi 1.0% 105 12.3 12.0 8.1 42.2 4.4
4. rai1-tr1.dnaip.fi 0.0% 105 18.7 19.6 15.8 31.5 3.1
5. tuk2-sr1.dnaip.fi 1.0% 105 16.2 18.8 15.8 29.1 2.7
6. netnod-ix-ge-b-sth-1500.softlayer.com 0.0% 105 18.1 19.4 15.8 34.4 3.1
7. b2.11.6132.ip4.static.sl-reverse.com 0.0% 105 19.9 20.7 17.1 41.1 3.2
8. bd.11.6132.ip4.static.sl-reverse.com 0.0% 105 36.2 38.8 35.1 83.2 5.3
9. ae6.dar01.sr01.ams01.networklayer.com 1.0% 105 35.6 39.1 34.9 92.8 7.6
10. po1.fcr01.sr01.ams01.networklayer.com 48.1% 105 35.6 41.8 34.9 83.1 9.8
11. f0.4a.3a25.ip4.static.sl-reverse.com 0.0% 105 51.9 38.0 35.0 59.3 3.8

I'm not saying it's not possible that our ISP is at fault here. However there's probably something going on with the po1.fcr01. hop.

Edit: forgot to mention the game froze while mtr was running on the background; it's not quite evident at which side the issue is, my ISP or the "other end".
IGN: ihaveseenthefail
Last edited by jormaerik#5937 on Apr 9, 2018, 12:53:03 PM
same issue here. my winmtrs look the same but my ping never shows any change, constantly around 30 ms then a freeze and often a dc follows.
So, for the record: I can't say conclusively that the strange router is behaving correctly, only that it looks likely to be.

They only step beyond what y'all have done that could help debug this is to grab a tool like wireshark, filter it to the GGG / PoE traffic only, and then look at what happened just prior to the disconnection.

That is getting into the realm of advanced network debugging, though, so I'm not gonna try and guide you through it.

The other path is to talk to GGG tech support, and/or your ISP tech support, and ask them to look into the problem. Supply a pointer to this thread, I'd suggest, as background. They may disclaim responsibility, etc, but both sides do have the capacity to look further into the issue - the real question is the investment of time it can take to do so.

Good luck. I know these problems are painful, and I honestly wish I could have said "yah, it just your home network / the wire from your modem to the ISP". Not because I want you to have problems, but because those would have been the easiest network problems to solve. :)

Report Forum Post

Report Account:

Report Type

Additional Info