This fragment is about to be reported (you'll remain on this page): You can enter a comment to clarify the mistake if you would like to: |
This is a follow-up on my previous post «Trouble with UTF8Encode» where I've run into a trouble I didn't feel was solvable. Surprisingly enough I've run into the solution in turn much sooner than I expected.
Several days ago
I was putting my SQLite wrapper for Delphi 7 in real-life use rather than a demo application and have during the process had stumbled upon a very weird behavior: it kept doing something that prevented SQLite from opening a database. It wasn't something serious, I thought, perhaps I've locked it in some place earlier. However, as I've moved through the code I was understanding that there was no such place.
In the end, after more than an hour of debugging and commenting things out all around the weirdness has hit the apex: everything worked fine if I'd add an extra character used in some completely different part of program and it would say «Cannot open the database» if I'd remove it.
And then I made some change that made sqlite3.dll crash with an Access Violation. «Whoa,» I thought, «thi is no more my app; what's wrong with this thing?». And I've fired up IDA Pro.
To my own surprise I've immediately understood what was going on once I've looked at the code preceding the call to sqlite3_open(). It only surprised me to no end that during my very long acquaintance with Delphi I have never stumbled upon this glitch – the credits probably go to a smarter management of strings Delphi has over traditional C++.
I've decided to write this blog entry mostly not for my memory but to help others who might also run over this problem. For demonstration I've written a tiny console app:
pascalprogram PCharTest; {$APPTYPE CONSOLE} function Foo(S: String): PChar; begin Result := PChar(S + S); end; begin WriteLn(Foo('some sample string')); ReadLn; end.
If you compile and run it you'll see an empty string, some garbage or the program will just crash. Why? This is exactly the problem I'm going to tell you about.
But first let me give a quick idea of how I thought Delphi was managing strings.
I knew that Delphi does a good job managing strings – they, just like objects in Java, have reference counters and are automatically tracked by the compiler which deallocates them when they're no more referenced.
However, as it turned out the above only applies to native Delphi strings – so-called short, long, ANSI, Wide and others (Wide strings don't have reference counters but they're still managed by the RTL). When dealing with C strings you have to be careful and in some cases (when strings are used outside of the function where they're allocated) you even need to adapt the habit of manual memory management C programming requires.
Let me clarify how the above demo program works. Let's look at disassembly:
asmmov [ebp+varTemp], edx mov [ebp+argS], eax mov eax, [ebp+argS] call System::__linkproc__ LStrAddRef(void *) lea eax, [ebp+varTemp] mov ecx, [ebp+argS] mov edx, [ebp+argS] call System::__linkproc__ LStrCat3(void) mov eax, [ebp+varTemp] call System::__linkproc__ LStrToPChar(System::AnsiString) mov ebx, eax xor eax, eax pop edx pop ecx pop ecx mov fs:[eax], edx push offset YYYYYY lea eax, [ebp+varTemp] mov edx, 2 call System::__linkproc__ LStrArrayClr(void *,int)
I have cleared the code, removed stack frame and demangled library function names so real disassembly will look differently for you. However, it's good for our needs.
The above listing stands for this Delphi code:
pascalfunction Foo(S: String): PChar; begin Result := PChar(S + S); end;
It beings with incrementing reference counter of S, then it concatenates S with itself and then it converts the result into PChar. This point is important and we will look into LStrToPChar (which can be found in System.pas) a bit later; here let's just note that the function ends with deallocation of S (if this argument was defined as
pascalconst S: String the compiler wouldn't have allocated it and no call would be made).
asmtest eax, eax jz short handle0 retn zeroByte db 0 handle0: mov eax, offset zeroByte retn
What we see here? Naturally, the convertion
pascalString -> PChar is done simply by, well, treating String as if it was PChar. Actually, this works, because each Pascal string has the following structure (WideString doesn't have the refcount):
-4 refcount 0 length +1 first character, if any ... XX zero byte
In other words, each Pascal string already has null-terminator appended so there's absolutely no overhead of converting it into a C string.
But as a result Delphi doesn't track C strings and it's up to programmer to ensure that the original Delphi string is still on the heap or somewhere else and hasn't gone out of scope and was freed.
Now we can understand what goes on when we try to return a PChar string as a function result: first we make a Delphi string, then we «convert» it (actually, simply typecast) into PChar and then the temporary Delphi string is deallocated and the function returns a pointer to unallocated memory block.
How to workaround this problem? Well, you could use a global variable to store that temporary Delphi string – since global variables never go out of scope they're never deallocated. However, I'd say it's the very last way of doing this because one hardly can be sure that this variable won't be used twice.
So the only way left is manually typecasting Delphi strings into PChar's in places where they're actually used (typically these are calls to WinAPI or other external C functions). Returning PChar or PWideChar simply wouldn't do the trick.
Comments