Jump to content


Access Violation in ntdll.dll


4 replies to this topic

#1 eddie

    Senior Member

  • Members
  • PipPipPipPip
  • 751 posts

Posted 31 March 2008 - 05:24 AM

Hey!

Now, this isn't strictly game programming related, but I find we tend to have a lot of specific-error-knowledge tucked away in our heads at times, and I'm wondering if anyone has seen something like this, and can point me at my error.

I've been building a build system that's mostly Lua, with some specific C++ code written in to do the menial stuff. It works fine under gcc in Linux, but on Windows, with gcc (MinGW) or using Visual Studio 2005, I get a crash that I can't piece together.

Basically, whenever I get to a certain point in my application's run, I get an Access Violation inside ntdll, that appears to be from a mutex lock.

I'm not sure what this 'means', exactly, in terms of what I'm doing wrong. The string it's destructing seems alright; there are no other threads running at this current time, so I'm not really sure what's going on?

Here's my code that ends up calling other functions that end up generating the access violation:


static int LUA_ProcessExecute(lua_State* pLuaState)

{

	char const * pCommand = luaL_checkstring(pLuaState, 1);


	int returnValue= Process::Execute(pCommand); // Access violation comes from pCommand being converted to std::string, and then destructing the temporary


	lua_pushnumber(pLuaState, returnValue);


	return 1;

}


And here's the call stack.

 	ntdll.dll!_RtlpWaitForCriticalSection@4()  + 0x5b bytes	

 	ntdll.dll!_RtlEnterCriticalSection@4()  + 0x46 bytes	

 	mke.exe!_lock(int locknum=4)  Line 349	C

>	mke.exe!operator delete(void * pUserData=0x012541c0)  Line 45 + 0x7 bytes	C++

 	mke.exe!std::allocator<char>::deallocate(char * _Ptr=0x012541c0, unsigned int __formal=2560)  Line 141 + 0x9 bytes	C++

 	mke.exe!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Tidy(bool _Built=true, unsigned int _Newsize=0)  Line 2076	C++

 	mke.exe!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::~basic_string<char,std::char_traits<char>,std::allocator<char> >()  Line 906	C++

 	mke.exe!LUA_ProcessExecute(lua_State * pLuaState=0x00b75408)  Line 13	C++

 	mke.exe!luaD_precall(lua_State * L=0x00b75408, lua_TValue * func=0x00d664a8, int nresults=1)  Line 319 + 0x16 bytes	C

 	mke.exe!luaV_execute(lua_State * L=0x00b75408, int nexeccalls=8)  Line 587 + 0x14 bytes	C

 	mke.exe!luaD_call(lua_State * L=0x00b75408, lua_TValue * func=0x00b757b0, int nResults=-1)  Line 377 + 0xb bytes	C

 	mke.exe!lua_call(lua_State * L=0x00b75408, int nargs=0, int nresults=-1)  Line 783 + 0x11 bytes	C

 	mke.exe!luaB_dofile(lua_State * L=0x00b75408)  Line 329 + 0xd bytes	C

 	mke.exe!luaD_precall(lua_State * L=0x00b75408, lua_TValue * func=0x00b75790, int nresults=0)  Line 319 + 0x16 bytes	C

 	mke.exe!luaV_execute(lua_State * L=0x00b75408, int nexeccalls=1)  Line 587 + 0x14 bytes	C

 	mke.exe!luaD_call(lua_State * L=0x00b75408, lua_TValue * func=0x00b75780, int nResults=1)  Line 377 + 0xb bytes	C

 	mke.exe!f_call(lua_State * L=0x00b75408, void * ud=0x0013fbac)  Line 801 + 0x16 bytes	C

 	mke.exe!luaD_rawrunprotected(lua_State * L=0x00b75408, void (lua_State *, void *)* f=0x00418da0, void * ud=0x0013fbac)  Line 118 + 0x1f bytes	C

 	mke.exe!luaD_pcall(lua_State * L=0x00b75408, void (lua_State *, void *)* func=0x00418da0, void * u=0x0013fbac, int old_top=224, int ef=208)  Line 463 + 0x11 bytes	C

 	mke.exe!lua_pcall(lua_State * L=0x00b75408, int nargs=0, int nresults=1, int errfunc=13)  Line 822 + 0x20 bytes	C

 	mke.exe!main(int argc=1, char * * argv=0x00b73730)  Line 313 + 0x17 bytes	C++

 	mke.exe!__tmainCRTStartup()  Line 327 + 0x19 bytes	C

 	mke.exe!mainCRTStartup()  Line 196	C

 	kernel32.dll!_BaseProcessStart@4()  + 0x23 bytes	



I don't want to overwhelm with information, but if someone smells something familiar and needs more information to test a theory, by all means, ask. I'd love to find a better starting point than re-reading every API function's documentation.

Thanks!

#2 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 31 March 2008 - 07:05 AM

Where is the character string allocated? Is it properly null-terminated? Maybe it's longer than the allocated buffer.

Try copying the character string to a newly allocated buffer.

Could you show us the disassembly code of the access violation? Hit Alt+8.

#3 eddie

    Senior Member

  • Members
  • PipPipPipPip
  • 751 posts

Posted 31 March 2008 - 05:54 PM

Hey Nick,

Thanks for posting.

I've modified the crash area a bit, in the hopes the error would become more apparent. Either it hasn't, or I'm just dumb.

New code:



static int LUA_ProcessExecute(lua_State* pLuaState)

{

	char const * pCommand = luaL_checkstring(pLuaState, 1);


	std::string const kString = pCommand;

	int returnValue= Process::Execute(kString);


	lua_pushnumber(pLuaState, returnValue);


	return 1; // Crashes here, same call stack, while destructing kString

}



The string comes from the Lua API (it's created in Lua), and I'm simply using their API to retrieve it. I have fairly high confidence their stuff is working correctly, but there could be a chance I'm mis-using it.

That said, looking at the string, it's extraordinarily long:


C:\Program Files\Microsoft Visual Studio 8\Common7\Tools\../../VC/bin/lib.exe /nologo /out:"l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lua.lib" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lapi.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lauxlib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lbaselib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lcode.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/ldblib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/ldebug.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/ldo.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/ldump.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lfunc.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lgc.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/linit.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/liolib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/llex.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lmathlib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lmem.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/loadlib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lobject.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lopcodes.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/loslib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lparser.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lstate.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lstring.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lstrlib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/ltable.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/ltablib.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/ltm.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lua.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/luac.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lundump.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lvm.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/lzio.o" "l:/Data/Dev/projects/mke/build/debug/src/3rdparty/lua-5.1.3/src/print.o"


.. Maybe the API's I'm using don't allow such length? I'll have to look into that. Could be CreateProcess barfing.

That said, to answer your other questions while I look into that:

1.- It is NULL terminated
2.- I haven't tried copying it to a new buffer, but I'll try that next (out of curiousity, what am I looking for with this test?)
3.- Sure. Bear in mind this is the dissassembly from ntdll.dll!_RtlpWaitForCriticalSection@4() + 0x5b bytes



_RtlpWaitForCriticalSection@4:

7C918F8F  mov         edi,edi 

7C918F91  push        ebp  

7C918F92  mov         ebp,esp 

7C918F94  sub         esp,68h 

7C918F97  push        ebx  

7C918F98  push        esi  

7C918F99  mov         esi,dword ptr [ebp+8] 

7C918F9C  xor         ebx,ebx 

7C918F9E  cmp         esi,offset _LdrpLoaderLock (7C97C0D8h) 

7C918FA4  mov         dword ptr [ebp-8],ebx 

7C918FA7  sete        byte ptr [ebp+0Bh] 

7C918FAB  mov         eax,dword ptr fs:[00000018h] 

7C918FB1  movzx       ecx,byte ptr [ebp+0Bh] 

7C918FB5  mov         dword ptr [eax+0F84h],ecx 

7C918FBB  cmp         byte ptr [_LdrpShutdownInProgress (7C97C030h)],bl 

7C918FC1  jne         _RtlpWaitForCriticalSection@4+34h (7C919493h) 

7C918FC7  mov         al,byte ptr [_RtlpTimoutDisable (7C97C148h)] 

7C918FCC  neg         al   

7C918FCE  push        edi  

7C918FCF  sbb         eax,eax 

7C918FD1  not         eax  

7C918FD3  and         eax,offset _RtlpTimeout (7C97C140h) 

7C918FD8  mov         edi,eax 

7C918FDA  mov         eax,dword ptr [esi+10h] 

7C918FDD  cmp         eax,ebx 

7C918FDF  mov         dword ptr [ebp-4],eax 

7C918FE2  je          _RtlpWaitForCriticalSection@4+7Eh (7C919086h) 

7C918FE8  mov         eax,dword ptr [esi] 

7C918FEA  inc         dword ptr [eax+10h]   ' Access violation happens here

7C918FED  mov         eax,dword ptr [ebp-4] 

7C918FF0  and         eax,1 

7C918FF3  mov         dword ptr [ebp-18h],eax 

7C918FF6  mov         eax,dword ptr [esi] 

7C918FF8  inc         dword ptr [eax+14h] 

7C918FFB  test        byte ptr ds:[7FFE02F0h],1 

7C919002  jne         _RtlpWaitForCriticalSection@4+0A6h (7C9422EEh) 

7C919008  cmp         dword ptr [ebp-18h],ebx 

7C91900B  push        edi  

7C91900C  push        ebx  

7C91900D  jne         _RtlpWaitForCriticalSection@4+129h (7C936EBFh) 

7C919013  push        dword ptr [ebp-4] 

7C919016  call        _NtWaitForSingleObject@12 (7C90E9B4h) 

7C91901B  cmp         eax,102h 

7C919020  je          _RtlpWaitForCriticalSection@4+13Dh (7C942379h) 

7C919026  cmp         eax,ebx 

7C919028  jl          _RtlpWaitForCriticalSection@4+1FEh (7C942436h) 

7C91902E  cmp         byte ptr [ebp+0Bh],bl 

7C919031  pop         edi  

7C919032  je          _RtlpWaitForCriticalSection@4+227h (7C91904Ch) 

7C919034  mov         eax,dword ptr fs:[00000018h] 

7C91903A  mov         eax,dword ptr [eax+24h] 

7C91903D  mov         dword ptr [esi+0Ch],eax 

7C919040  mov         eax,dword ptr fs:[00000018h] 

7C919046  mov         dword ptr [eax+0F84h],ebx 

7C91904C  pop         esi  

7C91904D  pop         ebx  

7C91904E  leave            

7C91904F  ret         4    

7C919052  nop              

7C919053  nop              

7C919054  nop              

7C919055  nop              

7C919056  nop 


Thanks again for your help. Given me somewhere to start at least. Further pointers, insights, etc, would be appreciated.

#4 eddie

    Senior Member

  • Members
  • PipPipPipPip
  • 751 posts

Posted 31 March 2008 - 06:11 PM

*gulp*.

Just realized what I was doing wrong. I was copying that string into a buffer that was too small (CreateProcess says it can take only up to 2K characters on Win2000; and I set that as the boundary, without doing an assertion). Upping that to 32K (the max amount for non Win2K, apparently) makes things hunky dory. Makes sense - just can't believe I didn't assert on it earlier.

That said, Nick, I'm curious as to what you were looking for, so I can follow the same debugging path in the future. Do you mind if you explain what you were tracking down?

Thanks!

#5 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 31 March 2008 - 07:49 PM

Access violation errors in a system DLL are practically always caused by memory corruption inflicted by the application.

So I was looking for things that could cuse such corruption. Not null-terminating a string and overflow are number one causes. By copying the string elsewhere you can test whether the crash happens because of the string content itself or not. Looking at the assembly code, even when you don't really understand what's happening, you can often find registers with pointer values that are 'dangerously close' to your application data. This can give you hints at what memory buffers you should check for overflow.

Anway, I'm glad you found the cause and fixed it!





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users