Development Manifesto - Desync

Lately I've been seeing lots of posts about desync and GGG not fixing or making it a priority. There was a Development Manifesto about this subject in the past, written by Chris. As it is currently not accessible I want to share it with newcomers and people that don't know its existence. It's an extremely informational post, so please read it before creating another thread.

"
Chris wrote:
"Desync" is a very hot topic at the moment. At best it's a minor annoyance when it occurs and at worst it can cause characters to get killed in situations where they thought there were no monsters around. We have many changes coming that will substantially improve the situation, but would like to also explain how our synchronisation systems work in case you're interested, and to make it clear that game state synchronisation is a problem that all online games need to deal with.

In this article I'm going to try to clearly explain:

  • How different types of online games handle latency
  • How our system of action prediction works
  • Why sync problems occur with this system and how they manifest
  • Why desync has to exist and why rubber-banding is good
  • Why some other games don't appear to have similar problems
  • What we're planning to do to improve synchronisation

How different types of online games handle latency

Any game has calculations that occur to determine the result of actions. In RPGs, these can range from combat calculations (who did what damage) to important economic transactions involving game items. To prevent players cheating, it's important that these calculations are not done on the gamer's computer, because they can easily modify the result of such calculations.

Because of this, all calculations that affect someone's progress must be done on servers that we control. These servers exist all over the world (Texas, Amsterdam and Singapore, currently), but due to the speed of light and other physical limitations, it's not instant to send or receive data from them. We typically see response times between our players and the servers of around 50-250ms.

All online games have this situation. The server has to dictate whether things happen or not, but there's a 50-250ms delay before data gets to the server and back. There are three ways that games can solve this:

  • Trust the client. This means people can cheat, but the results are instant. We will not do this.
  • Wait until data arrives back from the server before doing anything. This is a very common strategy in RTS and MOBA games. If you click to move, the unit will only start moving once the server says so, which is 50-250ms later. If you are close to the server, you'll quickly get used to the lag and everything feels pretty good. If you're far away (New Zealand, for example), it feels like you're playing drunk. Every time you issue an order, nothing happens for quarter of a second. This does not work for Action RPGs.
  • Start predicting the result of the action as though the server said yes, immediately. When the server later gets back to you with a result, factor it in. This is what Action RPGs including Path of Exile do. It means that when you click to move, or click to attack, it occurs instantly and feels great. The problem is what happens when the server decides that the action can't have occurred - that's when the game gets badly out of sync.
  • Action RPGs have to use the third system (action prediction) to feel responsive. The problem is, the second you start moving, you're implicitly out-of-sync by definition. Your client has drawn the first few frames of movement (to be nice and responsive), but the server has no idea you clicked a button yet until the data arrives. Action prediction is mandatory for this type of game but results in you being slightly out-of-sync almost all of the time. This is generally no problem, but once too many predictions get made based on incorrect data, very bad things happen. The challenge is detecting and correcting the situation before this occurs.

How our system of action prediction works

Let's say you're playing with 200ms round-trip latency and you click a monster that is 2 seconds of travel distance away from you. Assume your attack animation has its contact point after 300ms (which is where damage is dealt).

  • 0ms: You click the monster. Your character starts running towards it on the client.
  • 100ms: Your click arrives at the server. The character there starts running towards the monster also. At this stage your local character is already 5% of the way there.
  • 2000ms: Your character arrives at the monster on the client. It's not there yet on the server. You don't even know if it'll ever arrive for sure (it might get interrupted by an attack still). Your client starts to animate the sword swing:
  • 2100ms: Your character arrives at the monster on the server. The server immediately performs the combat calculation in advance of the contact point and sends the tentative result back to the client.
  • 2200ms: You receive the notification from the server about what type of damage you will deal and roughly how much. Thankfully it arrived before the contact point of the animation! This is not always the case.
  • 2300ms: You hit the contact point on the client. Because you have the damage information in advance, you can draw a pleasing blood splatter, fire effect and so on. This hit has not even occurred yet on the server.
  • 2400ms: You hit the contact point on the server. The damage is locked in and actually applied to the monster. It dies. Experience and item drops are calculated and sent to the client.
  • 2500ms: Your client receives an experience update and the information of what items to show falling to the ground.

Despite the fact that your information is delayed by 100ms, it arrived before the contact point and the only indication of playing under latency that the client noticed was the fact that it took a tenth of a second for the item drops to arrive. At no point in that process was any gameplay calculation compromised in a way that would enable players to cheat the system.

Why sync problems occur with this system and how they manifest

This above example assumes that everything went smoothly. It's entirely possible for the 2 second travel time to be completely different on both ends, or for a lag spike to occur causing the timing to get completely out of sync. If the attack is interrupted on the server before it starts (during movement) but not on the client, then you have a long animation playing that can't be cancelled because the communication time is a decent length of the animation.

Even if no strange lag occurs, the monsters that are nearby are pathfinding on the client to where they think you are - which by definition is different than on the server because of latency. These entities have to find paths that go around the other monsters, which of course are in subtlety different positions on both ends. The differing paths further contribute to the monsters being in the wrong place.

It's worth stressing that in 99% of combat events, everything feels fine. Although the simulation is out-of-sync due to the speed of data transmission, the timing generally works out and monsters who are following weird paths get to you at roughly the right times and in roughly the right places. It's hard to really know that anything's wrong... except when it's horribly wrong.

Unfortunately, when things are very out of sync, players have a pretty bad time. They take damage out of nowhere or find that they're actually trapped between monsters that didn't appear in the right places on their client. We have code to detect these situations and hopefully resync (rubber-band) the entities back into place quickly, but it's often not good enough.

Why desync has to exist and why rubber-banding is good

The key thing to understand is that Action RPGs have to use an action prediction system like this. If they wait for confirmation of every action from the server then it feels terrible to control.

Even if our resyncing code was perfect, there would be situations where the game gets out of sync just because of tiny timing differences. Imagine you're running near a large rock, and you arbitrarily click on the other side of it. Both the client and the server attempt to find the shortest path around the rock. Because your client is ahead of the server by definition (as the movement was processed there approximately 50-250ms earlier, so that it was responsive), there are cases where the client may choose to go a different way around the rock than the server. If you were hit by a monster en-route, then your movement will be interrupted in a different place on both simulations. You are now out of sync. Intelligent resync code would detect this and rubber-band you across the rock back to where you're meant to be.

The key observation here is that improved resync code involves more rubberbanding than we have at the moment. If we do it properly, monsters and players will be corrected to better positions more frequently, to prevent anything getting drastically out of place. Many players interpret the rubber-banding itself as "desync", when in reality it's what is fixing the problem as it is detected. It's not going to be easy explaining that the increased rate of rubber-banding is not only good, but also the ideal solution.

Why some other games appear to not have similar problems

Games using the "wait until server responds" method (RTS and MOBA games) have much higher input latency but don't have the same sync issues that we do. They have their own class of game state synchronisation problems that we thankfully don't have to deal with.

Games using client action prediction like ours run into exactly the same sync issues that we do unless they cheat on certain aspects of the simulation. For example, it's common for Action RPGs to do some combination of the following:

  • Entities can hit each other from a long distance away
  • There's no chance to hit - all hits occur for sure
  • Various speed/collision concessions that make it easy to speedhack and/or walk through monsters with modified clients
  • Attack animations cannot be interrupted (i.e. what we treat as Stun).

  • Unfortunately, we don't want to do any of those things! They each individually ruin part of the hardcore experience: by allowing combat/movement cheats, preventing accuracy from existing as a mechanic, prevent stunlock, preventing people getting blocked in, etc.

    Due to the fact that we want to have hardcore game mechanics (i.e. ones where position matters and it's difficult to cheat in PvP), the only option for us has been to put a lot of work into improving our combat simulation and resync code.

    What we're planning to do to improve synchronisation

    There are a lot of changes that we're experimenting with that may individually improve the synchonisation of the combat simulation (along with their potential drawbacks):

    • Have monsters on the client attack your server location rather than client location to reduce entropy. Maybe compromise on them attacking a mid-way point between the two. The drawback here is that it means they'll appear they are swinging at the air, but they're technically more in sync.
    • Display blood and elemental effects at the contact point on the client rather than as damage confirmation. This will mean that combat feels more impactful, but we lose the communicated visual information about whether damage was actually dealt. It could be that this is easier to apply to effects from spells because they generally don't have a hit/miss calculation.
    • Resync entities that successfully hit you when nothing is on the client near you. This may actually pull the entity even more out-of-sync if you're in the wrong place yourself.
    • Resync everything in an area around a desynced entity. This reduces overall entropy massively but would be pretty jarring.
    • Delay actions if the client was ahead on its path. This will solve the case where monsters die before you get to them (if you were out of sync) but technically results in lower combat efficiency for players in these cases.
    • Improve the distance-based resyncing that occurs for things that are far away from where they should be. It doesn't currently take movement speed into account properly. This is why Rhoas feel quite out of sync when charging.
    • Measure overall entropy around the player and force a resync if it exceeds some threshold. The problem is that by the time the resynced information gets to the client, more actions could have occured.
    • Fix bugs with specific skills that cause them to act differently on the client and server (Whirling Blades, for example, sometimes fails to trigger based on distance on one end).

    Other changes too subtle/difficult to explain clearly here

    At this stage it looks like the biggest gains will come from improving the resync code so that it rapidly and reliably resyncs the combat situation if things get too desynchronised. This will mean more rubber-banding (as explained earlier), but will massively reduce deaths that occur from the player not being able to see the true locations of entities.

    I explained the above changes with their drawbacks because I want to make it clear that this problem is intrinsically difficult to solve. We're fighting against both the laws of physics (travel speed of data) and the desire to not compromise gameplay mechanics. I have full confidence that we will incrementally deploy changes during Open Beta that substantially improve this situation.

    I'm sorry about the wall of text. I hope I explained it clearly enough. I am also sorry that it has taken us this long to prepare changes for this issue. We are very careful to not introduce additional problems to the combat simulation and want to make very sure that the changes we're deploying are big improvements. I will let you know as soon as we have a specific patch in mind that we'll start introducing changes in.
  • No longer a forum dweller, please use PM for contact purposes.
    I know this one and its ages old.

    1 Time in PoE History GGG spoke about Desync.





















































    But they dont fix/make/adress ANYTHING so far.

    That is the most Problem for the ppl.

    Take action pls GGG
    Suggestion about a new Trading System,
    based on Chris Idea,
    he mention in SoE Episode 20

    http://www.pathofexile.com/forum/view-thread/1114823
    We've gone quite some time without a desync post... why stir the pot?
    "
    Shagsbeard wrote:
    We've gone quite some time without a desync post... why stir the pot?


    to "remember" ppl that desync exist?


    ;)
    Suggestion about a new Trading System,
    based on Chris Idea,
    he mention in SoE Episode 20

    http://www.pathofexile.com/forum/view-thread/1114823
    Thanks for this, was interesting :)
    In another thread I've made a claim that more servers would help with desync issues, since more people would live closer to a server and by this reducing the travel time. I had this post in mind when I wrote that.
    But apparently I misinterpreted the post, I think, or remembered it wrong.

    Now I don't understand. Is travel time a serious issue which, when reduced, would help with desync, or not?
    Last edited by Jojas on Dec 18, 2013, 9:33:16 AM
    "
    Shagsbeard wrote:
    We've gone quite some time without a desync post... why stir the pot?

    So sharing an informative post to avoid future unintelligent desync threads is stirring the pot. When you gave up on logic Shagsbeard?
    No longer a forum dweller, please use PM for contact purposes.
    "
    Jojas wrote:
    In another thread I've made a claim that more servers would help with desync issues, since more people would live closer to a server and by this reducing the travel time. I had this post in mind when I wrote that.
    But apparently I misinterpreted the post, I think, or remembered it wrong.

    Now I don't understand. Is travel time a serious issue which, when reduced, would help with desync, or not?


    A lower ping will always help, but due to the nature of PoE's networking model, it won't necessarily get that much better.

    PoE predicts actions and corrects occasionally (vaguely speaking), now if you let the uncorrected system predict ahead for a while it breaks down, building up error and you eventually get to the point where it is massively out of sync, then the resync comes (the teleport part) and people think that is the desync. The desync is whatever happens between resyncs, effectively (although not entirely).

    More and better servers (and somewhat different methods) could mostly alleviate the horrible desync issues, for certain kinds of users (low ping, high bandwidth).

    Unfortunately it seems that this isn't practical for GGG (likely cost or profit issues).
    Also worth a read as the GGG responds are more in-depth and updated compared to the manifesto and most communication is done by the actual man in charge of the problem:

    http://www.pathofexile.com/forum/view-thread/626664/
    http://www.pathofexile.com/forum/view-thread/626664/filter-account-type/staff

    It is also overall a pleasant, constructive thread aside all the ad-hominem attacks all over the place.
    Last edited by Nightmare90 on Dec 18, 2013, 9:41:04 AM
    Ok... that was a bit snarky of me. But really this topic has been beaten to death. People who have gotten over it have either adapted or moved on to other games. Very few are interested in rehashing the same topic using year old information.

    The solution comes down to money. It's a problem that can only be fixed by throwing money at it (figuratively). They've balanced their costs with making the game enjoyable and we're at a sweet spot in the compromise. Desync isn't bad enough to make the game unplayable, and the costs are not so high as to make them have to charge us to play.

    Report Forum Post

    Report Account:

    Report Type

    Additional Info