Investigating hang - giving some info about the bug [BUG]

Hello,

My game keeps freezing so I took some time to try and investigate what's happening so here it is:

The event log looks like this:
"
AppName PathOfExileSteam.exe
AppVersion 0.0.0.0
AppTimeStamp 67888689
ModuleName KERNELBASE.dll
ModuleVersion 10.0.26100.2454
ModuleTimeStamp 398a1cce
ExceptionCode e0000001
FaultingOffset 00000000000c837a
ProcessId 0x920


After doing some reverse I saw that the event says that the hang is here:
"
.text:00000001800C836E 48 8D 4C 24 20 lea rcx, [rsp+0F8h+ExceptionRecord] ; ExceptionRecord
.text:00000001800C8373 48 FF 15 EE 77 1A 00 call cs:__imp_RtlRaiseException
.text:00000001800C837A 0F 1F 44 00 00 nop dword ptr [rax+rax+00h]
.text:00000001800C837F 48 8B 8C 24 C0 00 00 00 mov rcx, [rsp+0F8h+var_38]
.text:00000001800C8387 48 33 CC xor rcx, rsp ; StackCookie


on the nop instruction that as you might know it does nothing and it's probably just compiler magic to align some code blocks ( not getting into it). The thing is, the event says that it's a crash (Kernel-Power type) right after a call to RtlRaiseException from ntdll.dll ( the above code is from kernelbase.dll -> more exact from RaiseException that is more or less a wrapper to RtlRaiseException). Since the exception code is 0xe0000001 it's obviously not a standard windows exception code and I start looking into PoE2 assembly code and just one bit of code has this exception code in the main executable:

"
.text:00000001400D7610 Function proc near ; DATA XREF: Function+4↓o
.text:00000001400D7610 ; sub_1400E30D0+E↓o ...
.text:00000001400D7610 sub rsp, 28h
.text:00000001400D7614 lea rdx, Function ; Function
.text:00000001400D761B mov ecx, 16h ; Signal
.text:00000001400D7620 call signal
.text:00000001400D7625 xor r9d, r9d
.text:00000001400D7628 xor r8d, r8d
.text:00000001400D762B xor edx, edx
.text:00000001400D762D mov ecx, 0E0000001h
.text:00000001400D7632 add rsp, 28h
.text:00000001400D7636 jmp cs:RaiseException
.text:00000001400D7636 Function endp


Judging by the assembly code, seems that this is just a custom exception handler of some sort ( I might be wrong )

I saw that the above function is passed to another function labeled signal by the decompiler as seen below:
"

.text:00000001400E30D0 push rbx
.text:00000001400E30D2 sub rsp, 20h
.text:00000001400E30D6 mov rbx, rcx
.text:00000001400E30D9 call sub_141AF08E0
.text:00000001400E30DE lea rdx, Function ; Function <---- this is the function that generates the hang given what event viewer is saying and it's passed as 2nd parameter to the signal function. The first parameter being a 32bit integer that has the 0x16 = 22 decimal value.
.text:00000001400E30E5 mov ecx, 16h ; Signal
.text:00000001400E30EA call signal
.text:00000001400E30EF lea rcx, sub_1400D7660
.text:00000001400E30F6 call set_terminate
.text:00000001400E30FB lea rcx, sub_1400D7680
.text:00000001400E3102 call set_unexpected
.text:00000001400E3107 nop




The signal function looks roughly like this ( just to help you identify the function easier ):
"

if ( (unsigned __int64)Function - 3 <= 1 )
goto LABEL_32;
if ( (unsigned int)Signal <= 0x16 )
{
v4 = 6324292;
if ( _bittest(&v4, Signal) )
{
v5 = 0;
v6 = 0LL;
_vcrt_lock_0(3LL);
if ( (Signal == 2 || Signal == 21) && !byte_1439F1AE2 )
{
if ( SetConsoleCtrlHandler(ctrlevent_capture, 1) )
{
byte_1439F1AE2 = 1;
}
else
{
LastError = GetLastError();
*_doserrno() = LastError;
v5 = 1;
}
}
global_action_nolock = get_global_action_nolock(Signal);
if ( global_action_nolock )
{
v6 = (void (__cdecl *)(int))__ROR8__(
(unsigned __int64)*global_action_nolock ^ _security_cookie,
_security_cookie & 0x3F);
if ( Function != (_crt_signal_t)2 )
*global_action_nolock = (void (*)(int))(_security_cookie ^ __ROR8__(
Function,
64 - ((unsigned __int8)_security_cookie & 0x3Fu)));
}
_vcrt_unlock_0(3LL);
if ( !v5 )
return v6;
goto LABEL_32;
}
}
if ( (unsigned int)Signal > 0xB )
goto LABEL_32;
v10 = 2320;


I also wanna mention that the hang usually happens when I am loading another instance ( for example going to the next region - for example going from Clearfell to The Grelwood - it''s just an example ). I also saw some addresses in another hang some days ago that looked like kernel addresses so ( and it's just my "reverse engineering" intuition ) that mmight be a bug related to multithreading and the video driver but I might be awfully wrong with this. I will try to investigate more when I have more time. Until then I hope this is useful to you :)
Last bumped on Jan 27, 2025, 9:50:14 PM
I was able to generate a crashdump and this is much how the stacktrace looks like right before the exception occurs:

"
# 27 Id: 1ac.395c Suspend: 1 Teb: 000000c9`ed546000 Unfrozen
# Child-SP RetAddr Call Site
00 000000c9`fd5fbd08 00007ff6`622ac207 PathOfExileSteam+0xd7636
01 000000c9`fd5fbd10 00007ff6`622a27e0 PathOfExileSteam+0x235c207
02 000000c9`fd5fbd80 00007ff6`6237cfb8 PathOfExileSteam+0x23527e0
03 000000c9`fd5fbdb0 00007ff6`622a2740 PathOfExileSteam+0x242cfb8
04 000000c9`fd5fbde0 00007ff6`6229965e PathOfExileSteam+0x2352740
05 000000c9`fd5fbe10 00007ffb`8ca03846 PathOfExileSteam+0x234965e
06 000000c9`fd5fbee0 00007ff6`60bade19 ntdll!RcConsolidateFrames+0x6
07 000000c9`fd5ffaf0 00007ffb`8be1e8d7 PathOfExileSteam+0xc5de19
08 000000c9`fd5ffb50 00007ffb`8c9bfbcc KERNEL32!BaseThreadInitThunk+0x17
09 000000c9`fd5ffb80 00000000`00000000 ntdll!RtlUserThreadStart+0x2c


this is pretty much the code that generates the thread that produces the hang:
"

text:0000000140C5B723 mov rax, gs:58h
.text:0000000140C5B72C mov ecx, 13Ch
.text:0000000140C5B731 mov rax, [rax]
.text:0000000140C5B734 mov byte ptr [rsp+0A8h+dwCreationFlags], 1
.text:0000000140C5B739 xor r9d, r9d
.text:0000000140C5B73C mov edx, 28h ; '('
.text:0000000140C5B741 mov r8d, 10h
.text:0000000140C5B747 movzx ecx, word ptr [rcx+rax]
.text:0000000140C5B74B call sub_140C5F1A0
.text:0000000140C5B750 mov rbx, rax
.text:0000000140C5B753 mov [rsp+0A8h+arg_0], rax
.text:0000000140C5B75B test rax, rax
.text:0000000140C5B75E jz short loc_140C5B7AC
.text:0000000140C5B760 mov byte ptr [rax], 0
.text:0000000140C5B763 mov [rax+8], r14
.text:0000000140C5B767 lea rax, sub_140C5B580
.text:0000000140C5B76E mov [rbx+10h], rax
.text:0000000140C5B772 mov [rbx+18h], rdi
.text:0000000140C5B776 mov dword ptr [rbx+20h], 80000h
.text:0000000140C5B77D mov eax, r14d
.text:0000000140C5B780 xchg al, [rbx]
.text:0000000140C5B782 mov edx, [rbx+20h] ; dwStackSize
.text:0000000140C5B785 mov [rsp+0A8h+lpThreadId], r14 ; lpThreadId
.text:0000000140C5B78A mov [rsp+0A8h+dwCreationFlags], r14d ; dwCreationFlags
.text:0000000140C5B78F mov r9, rbx ; lpParameter
.text:0000000140C5B792 lea r8, sub_140C5DE10 ; lpStartAddress
.text:0000000140C5B799 xor ecx, ecx ; lpThreadAttributes
.text:0000000140C5B79B call cs:CreateThread
.text:0000000140C5B7A1 mov [rbx+8], rax
.text:0000000140C5B7A5 test rax, rax
.text:0000000140C5B7A8 jz short loc_140C5B806
.text:0000000140C5B7AA jmp short loc_140C5B7AF


Judging by
"
text:0000000140C5B723 mov rax, gs:58h
it seems that is something about TLS data. Since on x64 [gs:58h] holds the Linear address of the thread-local storage array.

Judging by this code:
"
.text:0000000140C5B701 loc_140C5B701: ; CODE XREF: sub_140C5B690+5E↑j
.text:0000000140C5B701 cmp dword ptr [rdi+20h], 0
.text:0000000140C5B705 jnz short loc_140C5B723
.text:0000000140C5B707 cmp dword ptr [rdi+28h], 0
.text:0000000140C5B70B jnz short loc_140C5B723
.text:0000000140C5B70D call cs:GetCurrentThread
.text:0000000140C5B713 xor edx, edx ; nPriority
.text:0000000140C5B715 mov rcx, rax ; hThread
.text:0000000140C5B718 call cs:SetThreadPriority
.text:0000000140C5B71E jmp loc_140C5B7D5
.text:0000000140C5B723 ; ---------------------------------------------------------------------------
.text:0000000140C5B723
.text:0000000140C5B723 loc_140C5B723: ; CODE XREF: sub_140C5B690+75↑j
.text:0000000140C5B723 ; sub_140C5B690+7B↑j
.text:0000000140C5B723 mov rax, gs:58h
.text:0000000140C5B72C mov ecx, 13Ch
.text:0000000140C5B731 mov rax, [rax]


the code that generates the hang it's happening only if [rdi+0x20] or [rdi+0x28] is not 0. Since rdi comes from rcx, it means that rdi is the first parameter passed to this function.

the C code for this function looks something like this:
"

int __fastcall sub_140C5B690(__int64 a1, __int64 a2, __int64 a3)
{
__int64 (__fastcall ***v5)(_QWORD, _BYTE *); // rcx
_BYTE *v6; // rdx
HANDLE CurrentThread; // rax
_QWORD *Thread; // rax
__int64 v9; // rdx
_QWORD *v10; // rbx
__int64 v11; // rbp
void *v12; // rcx
__int64 v13; // rcx
_BYTE pExceptionObject[24]; // [rsp+30h] [rbp-78h] BYREF
_BYTE v16[56]; // [rsp+48h] [rbp-60h] BYREF
_BYTE *v17; // [rsp+80h] [rbp-28h]

*(_QWORD *)(a1 + 56) = a2;
v17 = 0LL;
v5 = *(__int64 (__fastcall ****)(_QWORD, _BYTE *))(a3 + 56);
if ( v5 )
v17 = (_BYTE *)(**v5)(v5, v16);
sub_140103A40(v16, a1 + 64);
if ( v17 )
{
v6 = v16;
LOBYTE(v6) = v17 != v16;
(*(void (__fastcall **)(_BYTE *, _BYTE *))(*(_QWORD *)v17 + 32LL))(v17, v6);
}
if ( *(_DWORD *)(a1 + 32) || *(_DWORD *)(a1 + 40) )
{
Thread = (_QWORD *)sub_140C5F1A0(
*(unsigned __int16 *)(*(_QWORD *)NtCurrentTeb()->ThreadLocalStoragePointer + 0x13CLL),
40,
16,
0,
1);
v10 = Thread;
if ( Thread )
{
*(_BYTE *)Thread = 0;

......... snip ...........


I found only one call to this function in the main executable and the assembly code looks like this (sub_140C5BA30) :
"

.text:0000000140C5BE16 mov rdx, r12
.text:0000000140C5BE19 mov rcx, rbx
.text:0000000140C5BE19 ; } // starts at 140C5BD78
.text:0000000140C5BE1C call sub_140C5B690 <------ this is the call
.text:0000000140C5BE21 sub rbx, 0FFFFFFFFFFFFFF80h
.text:0000000140C5BE25 cmp rbx, rsi
.text:0000000140C5BE28 jnz short loc_140C5BDE0


and the function making this call is a function that has a variable numbers of arguments.

This is just a snippet in C just to help you identify the code:

"
v12 = "IDLE";
if ( a2 )
{
switch ( a2 )
{
case 1:
v13 = "MEDIUM";
break;
case 2:
v13 = "LOW";
break;
case 3:
v13 = "IDLE";
break;
default:
sub_1400CF8A0(&pExceptionObject, "Invalid job type");
CxxThrowException(&pExceptionObject, (_ThrowInfo *)&stru_142E85530);
}


Seems to be a problem with the first parameter. Also seems that in the main executable there is only one call to sub_140C5BA30

"
.text:0000000140C5C62D call sub_140C5BA30
.text:0000000140C5C632 add r14d, ebx
.text:0000000140C5C635 inc edi
.text:0000000140C5C637 cmp edi, 4
.text:0000000140C5C63A jb loc_140C5C580
.text:0000000140C5C640 mov byte ptr [rsi+48Ch], 1
.text:0000000140C5C647 lea r11, [rsp+170h+var_10]
.text:0000000140C5C64F mov rbx, [r11+20h]
.text:0000000140C5C653 mov rsi, [r11+30h]
.text:0000000140C5C657 mov rsp, r11
.text:0000000140C5C65A pop r14
.text:0000000140C5C65C pop rdi
.text:0000000140C5C65D pop rbp
.text:0000000140C5C65E retn


and this looks like a catch block. Also judging by the strings looks like a function that creates some kind of jobs
"
sub_14010ABD0(&v15, L"[JOB] Start");


and if i go in frame further on the calling stack i saw some more strings like :
"
v9 = sub_14010ABD0(&v236, L"[ENGINE] Cache directory: ");
v12 = sub_14010ABD0(&v236, L"[ENGINE] Download directory: ");
v16 = sub_14010ABD0(&v236, L"[ENGINE] Settings directory: ");


and after that there is a call to the functions i've posted above:

"
if ( *(_BYTE *)(a2 + 489) )
{
v23 = sub_140C5CA20();
sub_140C5C430(*(_QWORD *)(v23 + 8));
}


I hope it helps you investigate the issue. I will try to analyze further when i have some time
bump
//
Last edited by IceCool10#6669 on Jan 28, 2025, 7:13:51 PM
"

I also wanna mention that the hang usually happens when I am loading another instance ( for example going to the next region - for example going from Clearfell to The Grelwood - it''s just an example ). I also saw some addresses in another hang some days ago that looked like kernel addresses so ( and it's just my "reverse engineering" intuition ) that mmight be a bug related to multithreading and the video driver but I might be awfully wrong with this. I will try to investigate more when I have more time. Until then I hope this is useful to you :)


Yea there is a problem with the multi-threading code based on my testing on another issue that have been happening in both games, there is a chance that both issues might be closely related, ever since they updated their engine particle code, there is a high chance some of the old code might be incompatible with the new instruction, and since this engine is very old is most likely the case in some parts of it, which can be a nightmare if it doesn't work and would require rewriting from scratch.

that's the other issue post https://www.pathofexile.com/forum/view-thread/3706277

Report Forum Post

Report Account:

Report Type

Additional Info