Technical solution to eliminate desync in single-player sessions

NB: It's worth noting that a number of the OSI layers specify that a packet fails the FCS, then the packet should be dropped.

Hmm.
Stay out of the shadows ... They bite
"
RogueMage wrote:

In geometric terms, what the client does is predict the server's next projected player position, and then rotate the client's player to face more toward the projected server position. It then calculates a movement speed correction that will make the client's next player position update tend to converge on the server's projected position. In addition, it combines these corrections with its own projected simulation of the player's movement.


Pardon my interjection, but this would result in a very jarring experience for people with high latency or packet loss on connections (such as those playing on wireless).

Any packet delays will result in data entering the client out of order, and these out of order packets can fuck up the entire mechanism.

If you use TCP to reorder packet data, then essentially you're stuck until the correct packet comes, which would then result in a player-reorient and movespeed acceleration. In practice, this would mean that people playing over WiFi would experience a slight pause, then suddenly your character faces south instead of north and for 0.2s you move faster than you do when you use a quicksilver flask. It breaks the "smoothness" of the running animation because you move slow, then fast, then slow, then fast, etc...

If you use UDP and don't bother with reordered packet data, you then come into the problem of implementing correct timestamps, which include client-server time synchronization, etc.

Additionally, this presumes that the server can perfectly and fully predict client movement, which we all know is pretty much impossible. A wrong prediction from the server, based on your proposal, would cause all sorts of chaos, because the question then becomes, "Who do we follow for prediction? The server, or the client?"

If you say that the client is to be followed, then there is no point to having the server predict something, and this would be a waste of bandwidth, since any corrections the server might suggest would be essentially ignored, because, well, follow the client.

If you say that the server is to be followed, players with bad connections and who kite in zig-zag movements would have all sorts of graphical glitches, such as facing north (because the server predicted that they'll be moving north) while moving west, firing a projectile southeast while facing the other way, etc. When you turn while the server predicts that you move straight, you'd have animations playing of you moving straight very slowly then suddenly jumping and turning around faster than Superman changing clothes in a phone booth.

Don't get me wrong; I like this suggestion, I just don't see a way of feasibly making this work for a client-server architecture (if this would be peer-to-peer, yeah, I can see this working easily).
"
Skogenik wrote:
Can anyone familiar with Wireshark (or your preferred network analyzer) do something for me?

First however:

Background and definitions

Just to be ultra clear, when I say Server I mean the Path of Exile server, and the Client is the local executable.

I explicitly filtered Wireshark to ignore other traffic using this filter:
"ip.dst eq 37.58.0.0/16 or ip.src eq 37.58.0.0/16"

The 0.0/16 configures WS to capture anything from that 37.58 subnet.

I obtained the destination IP by using Process Explorer, viewing the PathOfExile process and then opening the TCP/IP tab; the Subnet did jump around as I left instances however.

I captured 45 minutes of network data and using the above filter gave me 22,715 packets, with an overall packet size of around 1.5Mb.

The data was collected whilst I was playing through the Ship Graveyard level, spawning zombies and looting; basically trying to play as normally as possible, I also performed an /oos at one point.

Findings

When I captured traffic between client and server, I noticed a lot of corrupted packets originating from the server.

To break down the numbers a bit:
Over the period the server sent me 9980 packets, with an average size of 76 bytes.
The client sent 12735 packets with an average of 56bytes length.

Now I noticed that many of the packets showed up in Red and black, indicating an error in the packet, so I added this new filter:
"ip.src eq 37.58.0.0/16 and eth.fcs_bad"

It filtered 6094 our of 9980 total and all because they had an invalid frame check-sum.

So there were only 3886 good packets sent from the server in the period.
In other words, 61% of the packets from the server were bad.

Thoughts

It's notable that everyone of these bad packets was a TCP/IP ACK and I was receiving 3 of these a second, intermingled with a total of 20 packets per second of everything else.

Now as I've mentioned before, I've been pretty lucky with Desync issues, and have rarely had them effect me noticeably. However, could it be that some peoples network hardware, does have a problem and could be choking on all these bad packets ?

There are also a couple of differences, all of the Client ACK' have no byte data and no padding.

The Server' ACK either have a Byte of data (ranging from 01 to FF) and a 5 byte padding;
Or no data, and a 6 byte padding; and there are 764 of those in my sample.

After said padding, each of these packets has a 4-byte Frame Check sequence check-sum.

So back to my original question, I'd be interested to see what results other people are getting. So can someone duplicate my findings? I'm based in the UK so if someone in a different region could test this, it would probably be quite helpful for GGG to know about.

Lastly, If there are any more statistics from this dump that I can give, please let me know and I'll do what I can.


Frame check sequences are done on the data link layer. Errors on it indicate an interface problem on the last link (i.e., router/wifi access point to your PC), not an error in transmission from the server.

TCP header checksum errors are what you're looking for, not frame check errors.
I know what you're saying, but the thing here is that the FCS generated by the Source node, IE their server, isn't the same as the FCS that my network thinks it should be.

It's a two-pass thing, the Server generates the FCS and the client validates it by generating what it thinks it should be.

So it's a toss up, either my Router can't do the math, or the source can't.

I've been using Wireshark for a while now on my home equipment as it's relevant to a work project that I'm involved in; I have never seen this amount of errors before, and when I have seen errors, certainly not in a consistent and expected pattern.

I did have a thought that this might be something that GGG were doing themselves, as an extra method to secure the data transmission, or perhaps obfuscate it; but the more I think about it, the more I doubt that that's the case.

In this instance, my gut is telling me, there's no smoke without fire; but this is precisely why I'd like someone to attempt to recreate my findings.
Stay out of the shadows ... They bite
"
Skogenik wrote:
I know what you're saying, but the thing here is that the FCS generated by the Source node, IE their server, isn't the same as the FCS that my network thinks it should be.

It's a two-pass thing, the Server generates the FCS and the client validates it by generating what it thinks it should be.

So it's a toss up, either my Router can't do the math, or the source can't.

I've been using Wireshark for a while now on my home equipment as it's relevant to a work project that I'm involved in; I have never seen this amount of errors before, and when I have seen errors, certainly not in a consistent and expected pattern.

I did have a thought that this might be something that GGG were doing themselves, as an extra method to secure the data transmission, or perhaps obfuscate it; but the more I think about it, the more I doubt that that's the case.

In this instance, my gut is telling me, there's no smoke without fire; but this is precisely why I'd like someone to attempt to recreate my findings.


It's not the server generating the FCS, it's your router.

Any time data is transmitted somewhere along the line that has an incorrect FCS, it's dropped. If this occurs on the server, the network switch it's connected to will drop the frame and you won't receive it to begin with. The fact that you can detect the frame transmit error means that it's on your last hop/link.

EDIT: Note that this is an ethernet frame encapsulating, and not a tcp/ip packet. As such, the "source" is considered to be the source MAC address, not the source IP. IP is used for end-to-end relay of data, ethernet for next-hop relay.
Last edited by Sachiru on Nov 24, 2013, 9:54:06 PM
"
SkyCore wrote:

Im not suggesting that udp streaming magically removes desynching. All im saying is that it corrects desynch at regular intervals just as /oos does. Its not perfect. But its easy to imp and its an improvement.


What do you mean by 'regular', any kind of regular /oos is going to look like stagger whenever it happens. You need to define what you mean by regular, because if its just an /oos that happens every 2 seconds, then that isn't much better than using a macro that does /oos every 2 seconds (at least from a user pov)
"
ScrotieMcB wrote:
"
RogueMage wrote:
...there is no need for a UDP update stream to incorporate combat resolution into its positional updates, since these events are already transmitted via the game's TCP communication systems. What the UDP stream provides is a series of server positional updates that the client can incorporate to reduce positional discrepancies between its simulation and the server's. With a client that can more closely track the server's simulation, the player's view of the battlefield will more accurately reflect the tactical situation seen by the server. The player's actions and expectations will then more closely align with the server's actual combat results.
Incorrect. UDP is best used on ephemeral data, that which changes rapidly; thus, the question is "what is ephemeral?"
Spoiler
We can agree that monster movement is ephemeral, but is monster life ephemeral? Are stuns and freezes ephemeral? You could make the argument that current monster life, stuns (with ending-time in terms of initial hit) and freezes (same ending-time designation) should be sent via UDP, because, like movement, a TCP packet that arrives after the stun or freeze is useless and does not require rerouting to destination. Note that, under this implementation, the server wouldn't be sending damage numbers to the client at all, because damage isn't part of the gamestate — however, current life and status effects (if any) of the various actors is part of the gamestate, so it would send that instead. TCP would be reserved for more permanent stats, like picking up items and items stats (once identified).

Sorry, but with that digression into UDP vs. TCP design philosophy, I think you're descending into the kind of territory that qwave was exploring: "How PoE's Client Implementation Should Be Fundamentally Redesigned". I believe Rhys' last post pretty much ruled out the feasibility of any major redesign projects:

"
Rhys wrote:
qwave has suggested a radical change to our core game systems. The concept is not impossible, I think, but it isn't really feasible. It's simply too big of a change.


"
ScrotieMcB wrote:
The reason your statements are incorrect, instead of merely debatable, is that the reasoning behind using UDP is bandwidth (and header) efficiency — the idea is to be more efficient with bandwidth on GGG's end to allow more continuous transmission of data from server to client. This falls apart if the servers are continuously transmitting both UDP data and TCP data — instead of gaining header efficiency, you're losing it by splitting your data up amongst multiple packets...

Sorry again, but the term "incorrect" means something very different to an engineer than the concerns you're raising. While transmission efficiency is an important consideration, it's only one of many potential benefits an additional UDP stream could provide. If I've commandeered your original UDP suggestions into a different approach than what you had in mind, I apologize, but I'm thinking strictly in practical terms rather than what may ideally have been the best approach to start with.

At this point in time, GGG has released a fully developed and tested ARPG using a TCP-based, open-loop client-server simulation system. I won't attempt to summarize the details previously discussed, only that GGG has committed to moving this design forward rather than re-architecting any of its core systems. I think that's a wise decision, based on several considerations:

1. PoE is not broken. It works reliably up until open-loop client and server simulations diverge significantly. At that point it abruptly resyncs the client with the server and continues with the uninterrupted game session.

2. PoE is not inefficient. Its frame rate does not bog down from network traffic overload until well after transmission latency reaches unplayable extremes.

3. GGG's resources are limited. This factor alone limits the scope of further improvements in simulation performance to projects that can leverage existing, well-tested game systems without invoking extensive re-engineering, high-risk scheduling, or purely speculative benefits.

"
ScrotieMcB wrote:
This also serves as an answer to your post directly addressed to me. Yes, you need to reinvent the wheel, and yes, UDP would have to replace TCP for all communication of ephemeral data, otherwise it's simply not worth it. In order for a hybrid UDP/TCP archetecture to work, TCP needs to be used very sparingly to gain efficiency.

Here I'm afraid you're doing more grandstanding than technical analysis. Understand that working engineers value conservative low-risk approaches that produce the most practical benefits while imposing minimal side-effects on existing systems and procedures. Instead of thinking idealistically, I often try to come up with the most minimal change that could make a significant difference, and then whittle it down to a practical size.
Last edited by RogueMage on Nov 24, 2013, 10:28:35 PM
"
Sachiru wrote:
"
RogueMage wrote:

In geometric terms, what the client does is predict the server's next projected player position, and then rotate the client's player to face more toward the projected server position. It then calculates a movement speed correction that will make the client's next player position update tend to converge on the server's projected position. In addition, it combines these corrections with its own projected simulation of the player's movement.


Pardon my interjection, but this would result in a very jarring experience for people with high latency or packet loss on connections (such as those playing on wireless).

Any packet delays will result in data entering the client out of order, and these out of order packets can fuck up the entire mechanism.

Spoiler
If you use TCP to reorder packet data, then essentially you're stuck until the correct packet comes, which would then result in a player-reorient and movespeed acceleration. In practice, this would mean that people playing over WiFi would experience a slight pause, then suddenly your character faces south instead of north and for 0.2s you move faster than you do when you use a quicksilver flask. It breaks the "smoothness" of the running animation because you move slow, then fast, then slow, then fast, etc...

If you use UDP and don't bother with reordered packet data, you then come into the problem of implementing correct timestamps, which include client-server time synchronization, etc.

Additionally, this presumes that the server can perfectly and fully predict client movement, which we all know is pretty much impossible. A wrong prediction from the server, based on your proposal, would cause all sorts of chaos, because the question then becomes, "Who do we follow for prediction? The server, or the client?"

If you say that the client is to be followed, then there is no point to having the server predict something, and this would be a waste of bandwidth, since any corrections the server might suggest would be essentially ignored, because, well, follow the client.

If you say that the server is to be followed, players with bad connections and who kite in zig-zag movements would have all sorts of graphical glitches, such as facing north (because the server predicted that they'll be moving north) while moving west, firing a projectile southeast while facing the other way, etc. When you turn while the server predicts that you move straight, you'd have animations playing of you moving straight very slowly then suddenly jumping and turning around faster than Superman changing clothes in a phone booth.

Don't get me wrong; I like this suggestion, I just don't see a way of feasibly making this work for a client-server architecture (if this would be peer-to-peer, yeah, I can see this working easily).

Please try reading my earlier descriptions again, they should provide answers to most of the hypothetical cases you raised.

As you speculated, UDP packet delays and mis-ordering are resolved by latency monitoring and server timestamping. These are not difficult engineering challenges, and the client simply discards corrupt or untimely data updates. If the connection becomes too laggy to transmit the UDP stream with acceptable latency and reliability, the client will simply revert back to the same open-loop simulation behavior that it produces today.

The current open-loop simulator system already uses clearly defined methods of handling discrepancies between client and server simulations. The UDP positional update stream would change none of this, it would be used solely to enable the client to more closely track the server. A well-tuned feedback control system is not jarring or chaotic as you feared, its loop gain and transient responses are tightly regulated to preclude instability and oscillation. These techniques have been used reliably in countless aeronautic and industrial systems and refined over many decades of engineering practice.
Last edited by RogueMage on Nov 24, 2013, 10:40:38 PM
"
Sachiru wrote:


It's not the server generating the FCS, it's your router.

Any time data is transmitted somewhere along the line that has an incorrect FCS, it's dropped. If this occurs on the server, the network switch it's connected to will drop the frame and you won't receive it to begin with. The fact that you can detect the frame transmit error means that it's on your last hop/link.

EDIT: Note that this is an ethernet frame encapsulating, and not a tcp/ip packet. As such, the "source" is considered to be the source MAC address, not the source IP. IP is used for end-to-end relay of data, ethernet for next-hop relay.


If you're going to cite Wikipedia, I'll just paste in the relevant bit from the same article:


"
wikipedia wrote:

All frames and the bits, bytes, and fields contained within them, are susceptible to errors from a variety of sources. The FCS field contains a number that is calculated by the source node based on the data in the frame. This number is added to the end of a frame that is sent. When the destination node receives the frame the FCS number is recalculated and compared with the FCS number included in the frame. If the two numbers are different, an error is assumed, the frame is discarded. The sending host computes a checksum on the entire frame and appends this as a trailer to the data. The receiving host computes the checksum on the frame using the same algorithm, and compares it to the received FCS. This way it can detect whether any data was lost or altered in transit. It may then discard the data, and request retransmission of the faulty frame.


For added amusement, read the notes for RFC1662 here.

I think, it makes it quite clear that FCS is calculated, at Source *and* target (not intervening hops).

Think about it, regardless of the facts - Why would the target node generate the FCS and drop the frame?
How would it know what the FCS is that was generated by the Server ?

It can't and doesn't; all it could know is that it generated a checksum for the data that it received.
The only way for the Target to know if the packet is corrupted is to test that its own checksum comes out the same as the one that the server created. Anything else would be nonsensical and useless.

The fact is though, the RFC states quite clearly that the Server generates the FCS and appends it to the end of the frame, and for a very good reason.

"

The FCS was originally designed with hardware implementations in
mind. A serial bit stream is transmitted on the wire, the FCS is
calculated over the serial data as it goes out, and the complement of
the resulting FCS is appended to the serial stream, followed by the
Flag Sequence.

The receiver has no way of determining that it has finished
calculating the received FCS until it detects the Flag Sequence.
Therefore, the FCS was designed so that a particular pattern results
when the FCS operation passes over the complemented FCS. A good
frame is indicated by this "good FCS" value.



In regards to your edit, I have confirmed that the MAC address contained in the packet header, is the mac address belonging to the IP addresses I was talking about; I figured it would be simpler to explain.

Edit: Typo's and layout and stuff.
Stay out of the shadows ... They bite
Last edited by Skogenik on Nov 24, 2013, 10:37:28 PM
@Skogenic: Offloading @ WireShark wiki. TLDR: Pretty much every outgoing packet is going to trigger a checksum error in Wireshark.

Also, in my case I needed to do a xxx.0.0.0/8, not /16, because the IP addresses used a larger range. Filtering out the HTTP and the outgoing packets (I checked the area flowchart during the race), I received the following:

Of these only a small number were marked bad by WireShark, and I'm not too worried about them; a few TCP out-of-orders (and associated errors, WireShark marks the resent packets bad), and a few keep-alive ACKs marked bad too, but nothing else I found.
"
RogueMage wrote:
"
ScrotieMcB wrote:
This also serves as an answer to your post directly addressed to me. Yes, you need to reinvent the wheel, and yes, UDP would have to replace TCP for all communication of ephemeral data, otherwise it's simply not worth it. In order for a hybrid UDP/TCP archetecture to work, TCP needs to be used very sparingly to gain efficiency.
Here I'm afraid you're doing more grandstanding than technical analysis. Understand that working engineers value conservative low-risk approaches that produce the most practical benefits while imposing minimal side-effects on existing systems and procedures. Instead of thinking idealistically, I often try to come up with the most minimal change that could make a significant difference, and then whittle it down to a practical size.
This isn't about idealism, it's about practicality. Using both protocols for continuous transmission isn't practical, because it isn't economical with bandwidth.
When Stephen Colbert was killed by HYDRA's Project Insight in 2014, the comedy world lost a hero. Since his life model decoy isn't up to the task, please do not mistake my performance as political discussion. I'm just doing what Steve would have wanted.
Last edited by ScrotieMcB on Nov 24, 2013, 10:55:00 PM

Report Forum Post

Report Account:

Report Type

Additional Info