FIX FUCKING DESYNC. THERES A SUGGESTION!

"
Chris wrote:
The ideas mentioned here are merely our current line of thinking on these topics. They are undergoing substantially more design and testing before getting onto Beta. Comments in this subforum have been disabled so that we can update the manifesto without having to delete all the out-of-date comments. For more information please see The Purpose of the Development Manifesto.

"Desync" is a very hot topic at the moment. At best it's a minor annoyance when it occurs and at worst it can cause characters to get killed in situations where they thought there were no monsters around. We have many changes coming that will substantially improve the situation, but would like to also explain how our synchronisation systems work in case you're interested, and to make it clear that game state synchronisation is a problem that all online games need to deal with.

In this article I'm going to try to clearly explain:
  • How different types of online games handle latency
  • How our system of action prediction works
  • Why sync problems occur with this system and how they manifest
  • Why desync has to exist and why rubber-banding is good
  • Why some other games don't appear to have similar problems
  • What we're planning to do to improve synchronisation


How different types of online games handle latency
Any game has calculations that occur to determine the result of actions. In RPGs, these can range from combat calculations (who did what damage) to important economic transactions involving game items. To prevent players cheating, it's important that these calculations are not done on the gamer's computer, because they can easily modify the result of such calculations.

Because of this, all calculations that affect someone's progress must be done on servers that we control. These servers exist all over the world (Texas, Amsterdam and Singapore, currently), but due to the speed of light and other physical limitations, it's not instant to send or receive data from them. We typically see response times between our players and the servers of around 50-250ms.

All online games have this situation. The server has to dictate whether things happen or not, but there's a 50-250ms delay before data gets to the server and back. There are three ways that games can solve this:
  • Trust the client. This means people can cheat, but the results are instant. We will not do this.
  • Wait until data arrives back from the server before doing anything. This is a very common strategy in RTS and MOBA games. If you click to move, the unit will only start moving once the server says so, which is 50-250ms later. If you are close to the server, you'll quickly get used to the lag and everything feels pretty good. If you're far away (New Zealand, for example), it feels like you're playing drunk. Every time you issue an order, nothing happens for quarter of a second. This does not work for Action RPGs.
  • Start predicting the result of the action as though the server said yes, immediately. When the server later gets back to you with a result, factor it in. This is what Action RPGs including Path of Exile do. It means that when you click to move, or click to attack, it occurs instantly and feels great. The problem is what happens when the server decides that the action can't have occurred - that's when the game gets badly out of sync.


Action RPGs have to use the third system (action prediction) to feel responsive. The problem is, the second you start moving, you're implicitly out-of-sync by definition. Your client has drawn the first few frames of movement (to be nice and responsive), but the server has no idea you clicked a button yet until the data arrives. Action prediction is mandatory for this type of game but results in you being slightly out-of-sync almost all of the time. This is generally no problem, but once too many predictions get made based on incorrect data, very bad things happen. The challenge is detecting and correcting the situation before this occurs.

How our system of action prediction works
Let's say you're playing with 200ms round-trip latency and you click a monster that is 2 seconds of travel distance away from you. Assume your attack animation has its contact point after 300ms (which is where damage is dealt).

0ms: You click the monster. Your character starts running towards it on the client.
100ms: Your click arrives at the server. The character there starts running towards the monster also. At this stage your local character is already 5% of the way there.
2000ms: Your character arrives at the monster on the client. It's not there yet on the server. You don't even know if it'll ever arrive for sure (it might get interrupted by an attack still). Your client starts to animate the sword swing:
2100ms: Your character arrives at the monster on the server. The server immediately performs the combat calculation in advance of the contact point and sends the tentative result back to the client.
2200ms: You receive the notification from the server about what type of damage you will deal and roughly how much. Thankfully it arrived before the contact point of the animation! This is not always the case.
2300ms: You hit the contact point on the client. Because you have the damage information in advance, you can draw a pleasing blood splatter, fire effect and so on. This hit has not even occurred yet on the server.
2400ms: You hit the contact point on the server. The damage is locked in and actually applied to the monster. It dies. Experience and item drops are calculated and sent to the client.
2500ms: Your client receives an experience update and the information of what items to show falling to the ground.

Despite the fact that your information is delayed by 100ms, it arrived before the contact point and the only indication of playing under latency that the client noticed was the fact that it took a tenth of a second for the item drops to arrive. At no point in that process was any gameplay calculation compromised in a way that would enable players to cheat the system.

Why sync problems occur with this system and how they manifest
This above example assumes that everything went smoothly. It's entirely possible for the 2 second travel time to be completely different on both ends, or for a lag spike to occur causing the timing to get completely out of sync. If the attack is interrupted on the server before it starts (during movement) but not on the client, then you have a long animation playing that can't be cancelled because the communication time is a decent length of the animation.

Even if no strange lag occurs, the monsters that are nearby are pathfinding on the client to where they think you are - which by definition is different than on the server because of latency. These entities have to find paths that go around the other monsters, which of course are in subtlety different positions on both ends. The differing paths further contribute to the monsters being in the wrong place.

It's worth stressing that in 99% of combat events, everything feels fine. Although the simulation is out-of-sync due to the speed of data transmission, the timing generally works out and monsters who are following weird paths get to you at roughly the right times and in roughly the right places. It's hard to really know that anything's wrong... except when it's horribly wrong.

Unfortunately, when things are very out of sync, players have a pretty bad time. They take damage out of nowhere or find that they're actually trapped between monsters that didn't appear in the right places on their client. We have code to detect these situations and hopefully resync (rubber-band) the entities back into place quickly, but it's often not good enough.

Why desync has to exist and why rubber-banding is good
The key thing to understand is that Action RPGs have to use an action prediction system like this. If they wait for confirmation of every action from the server then it feels terrible to control.

Even if our resyncing code was perfect, there would be situations where the game gets out of sync just because of tiny timing differences. Imagine you're running near a large rock, and you arbitrarily click on the other side of it. Both the client and the server attempt to find the shortest path around the rock. Because your client is ahead of the server by definition (as the movement was processed there approximately 50-250ms earlier, so that it was responsive), there are cases where the client may choose to go a different way around the rock than the server. If you were hit by a monster en-route, then your movement will be interrupted in a different place on both simulations. You are now out of sync. Intelligent resync code would detect this and rubber-band you across the rock back to where you're meant to be.

The key observation here is that improved resync code involves more rubberbanding than we have at the moment. If we do it properly, monsters and players will be corrected to better positions more frequently, to prevent anything getting drastically out of place. Many players interpret the rubber-banding itself as "desync", when in reality it's what is fixing the problem as it is detected. It's not going to be easy explaining that the increased rate of rubber-banding is not only good, but also the ideal solution.

Why some other games appear to not have similar problems
Games using the "wait until server responds" method (RTS and MOBA games) have much higher input latency but don't have the same sync issues that we do. They have their own class of game state synchronisation problems that we thankfully don't have to deal with.

Games using client action prediction like ours run into exactly the same sync issues that we do unless they cheat on certain aspects of the simulation. For example, it's common for Action RPGs to do some combination of the following:
  • Entities can hit each other from a long distance away
  • There's no chance to hit - all hits occur for sure
  • Various speed/collision concessions that make it easy to speedhack and/or walk through monsters with modified clients
  • Attack animations cannot be interrupted (i.e. what we treat as Stun).


  • Unfortunately, we don't want to do any of those things! They each individually ruin part of the hardcore experience: by allowing combat/movement cheats, preventing accuracy from existing as a mechanic, prevent stunlock, preventing people getting blocked in, etc.

    Due to the fact that we want to have hardcore game mechanics (i.e. ones where position matters and it's difficult to cheat in PvP), the only option for us has been to put a lot of work into improving our combat simulation and resync code.

    What we're planning to do to improve synchronisation

    There are a lot of changes that we're experimenting with that may individually improve the synchonisation of the combat simulation (along with their potential drawbacks):

    • Have monsters on the client attack your server location rather than client location to reduce entropy. Maybe compromise on them attacking a mid-way point between the two. The drawback here is that it means they'll appear they are swinging at the air, but they're technically more in sync.
    • Display blood and elemental effects at the contact point on the client rather than as damage confirmation. This will mean that combat feels more impactful, but we lose the communicated visual information about whether damage was actually dealt. It could be that this is easier to apply to effects from spells because they generally don't have a hit/miss calculation.
    • Resync entities that successfully hit you when nothing is on the client near you. This may actually pull the entity even more out-of-sync if you're in the wrong place yourself.
    • Resync everything in an area around a desynced entity. This reduces overall entropy massively but would be pretty jarring.
    • Delay actions if the client was ahead on its path. This will solve the case where monsters die before you get to them (if you were out of sync) but technically results in lower combat efficiency for players in these cases.
    • Improve the distance-based resyncing that occurs for things that are far away from where they should be. It doesn't currently take movement speed into account properly. This is why Rhoas feel quite out of sync when charging.
    • Measure overall entropy around the player and force a resync if it exceeds some threshold. The problem is that by the time the resynced information gets to the client, more actions could have occured.
    • Fix bugs with specific skills that cause them to act differently on the client and server (Whirling Blades, for example, sometimes fails to trigger based on distance on one end).
    • Other changes too subtle/difficult to explain clearly here


    At this stage it looks like the biggest gains will come from improving the resync code so that it rapidly and reliably resyncs the combat situation if things get too desynchronised. This will mean more rubber-banding (as explained earlier), but will massively reduce deaths that occur from the player not being able to see the true locations of entities.

    I explained the above changes with their drawbacks because I want to make it clear that this problem is intrinsically difficult to solve. We're fighting against both the laws of physics (travel speed of data) and the desire to not compromise gameplay mechanics. I have full confidence that we will incrementally deploy changes during Open Beta that substantially improve this situation.

    I'm sorry about the wall of text. I hope I explained it clearly enough. I am also sorry that it has taken us this long to prepare changes for this issue. We are very careful to not introduce additional problems to the combat simulation and want to make very sure that the changes we're deploying are big improvements. I will let you know as soon as we have a specific patch in mind that we'll start introducing changes in.
  • Report Forum Post

    Report Account:

    Report Type

    Additional Info