This fragment is about to be reported (you'll remain on this page):

You can enter a comment to clarify the mistake if you would like to:

Never return PChar in Delphi

This is a follow-up on my previous post «Trouble with UTF8Encode» where I've run into a trouble I didn't feel was solvable. Surprisingly enough I've run into the solution in turn much sooner than I expected.

Intro

Several days ago I was putting my SQLite wrapper for Delphi 7 in real-life use rather than a demo application and have during the process had stumbled upon a very weird behavior: it kept doing something that prevented SQLite from opening a database. It wasn't something serious, I thought, perhaps I've locked it in some place earlier. However, as I've moved through the code I was understanding that there was no such place.
In the end, after more than an hour of debugging and commenting things out all around the weirdness has hit the apex: everything worked fine if I'd add an extra character used in some completely different part of program and it would say «Cannot open the database» if I'd remove it.

And then I made some change that made sqlite3.dll crash with an Access Violation. «Whoa,» I thought, «thi is no more my app; what's wrong with this thing?». And I've fired up IDA Pro.

To my own surprise I've immediately understood what was going on once I've looked at the code preceding the call to sqlite3_open(). It only surprised me to no end that during my very long acquaintance with Delphi I have never stumbled upon this glitch – the credits probably go to a smarter management of strings Delphi has over traditional C++.

The point

I've decided to write this blog entry mostly not for my memory but to help others who might also run over this problem. For demonstration I've written a tiny console app:

pascalprogram PCharTest;

{$APPTYPE CONSOLE}

function Foo(S: String): PChar;
begin
  Result := PChar(S + S);
end;

begin
  WriteLn(Foo('some sample string'));
  ReadLn;
end.

If you compile and run it you'll see an empty string, some garbage or the program will just crash. Why? This is exactly the problem I'm going to tell you about.

But first let me give a quick idea of how I thought Delphi was managing strings.

I knew that Delphi does a good job managing strings – they, just like objects in Java, have reference counters and are automatically tracked by the compiler which deallocates them when they're no more referenced.

However, as it turned out the above only applies to native Delphi strings – so-called short, long, ANSI, Wide and others (Wide strings don't have reference counters but they're still managed by the RTL). When dealing with C strings you have to be careful and in some cases (when strings are used outside of the function where they're allocated) you even need to adapt the habit of manual memory management C programming requires.

The scenario

Let me clarify how the above demo program works. Let's look at disassembly:

asmmov     [ebp+varTemp], edx
mov     [ebp+argS], eax
mov     eax, [ebp+argS]
call    System::__linkproc__ LStrAddRef(void *)

lea     eax, [ebp+varTemp]
mov     ecx, [ebp+argS]
mov     edx, [ebp+argS]
call    System::__linkproc__ LStrCat3(void)

mov     eax, [ebp+varTemp]
call    System::__linkproc__ LStrToPChar(System::AnsiString)

mov     ebx, eax
xor     eax, eax
pop     edx
pop     ecx
pop     ecx
mov     fs:[eax], edx
push    offset YYYYYY
lea     eax, [ebp+varTemp]
mov     edx, 2
call    System::__linkproc__ LStrArrayClr(void *,int)

I have cleared the code, removed stack frame and demangled library function names so real disassembly will look differently for you. However, it's good for our needs.

The above listing stands for this Delphi code:

pascalfunction Foo(S: String): PChar;
begin
  Result := PChar(S + S);
end;

It beings with incrementing reference counter of S, then it concatenates S with itself and then it converts the result into PChar. This point is important and we will look into LStrToPChar (which can be found in System.pas) a bit later; here let's just note that the function ends with deallocation of S (if this argument was defined as pascalconst S: String the compiler wouldn't have allocated it and no call would be made).

LStrToPChar function:

asmtest    eax, eax
jz      short handle0
retn

zeroByte db 0
handle0:
mov     eax, offset zeroByte
retn

What we see here? Naturally, the convertion pascalString -> PChar is done simply by, well, treating String as if it was PChar. Actually, this works, because each Pascal string has the following structure (WideString doesn't have the refcount):

-4  refcount
 0  length
+1  first character, if any
...
XX  zero byte

In other words, each Pascal string already has null-terminator appended so there's absolutely no overhead of converting it into a C string.

But as a result Delphi doesn't track C strings and it's up to programmer to ensure that the original Delphi string is still on the heap or somewhere else and hasn't gone out of scope and was freed.

Now we can understand what goes on when we try to return a PChar string as a function result: first we make a Delphi string, then we «convert» it (actually, simply typecast) into PChar and then the temporary Delphi string is deallocated and the function returns a pointer to unallocated memory block.

How to workaround this problem? Well, you could use a global variable to store that temporary Delphi string – since global variables never go out of scope they're never deallocated. However, I'd say it's the very last way of doing this because one hardly can be sure that this variable won't be used twice.

So the only way left is manually typecasting Delphi strings into PChar's in places where they're actually used (typically these are calls to WinAPI or other external C functions). Returning PChar or PWideChar simply wouldn't do the trick.

Comments RSS20

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »