This fragment is about to be reported (you'll remain on this page): You can enter a comment to clarify the mistake if you would like to: |
This post is more of a reminder for myself than a real problem-solution article because I still don't know the reasons (actually, I do – check my follow-up article: «Never return PChar in Delphi»)… but let's start with the synopsis.
Long story short, a few days ago I was writing a Delphi unit making heavy use of UTF8Encode() function that, naturally, converts a string from Delphi WideString into a String. If we dig deeper, however, it turns out that it's only a wrapper for UnicodeToUtf8() which works on null-terminated strings rather than normal Pascal strings (why this is so I do not know – it doesn't even call a single WinAPI function).
The fragment of code that has made me nerviously pondering and debugging stuff all around was like this:
pascalvar Strings: array[0..High(VA)] of PChar; begin for ... do Strings[...] := UTF8Encode(...); end;
A funny part about this was that each next call to UTF8Encode() would not just return the new string but also replace the same string (PChar) that it has returned before stored in Strings[]. UTF8Encode() begins like this:
pascalfunction Utf8Encode(const WS: WideString): UTF8String; var L: Integer; Temp: UTF8String; begin Result := ''; if WS = '' then Exit; SetLength(Temp, Length(WS) * 3); // SetLength includes space for null terminator ...
Tracing has indicated that for some reason Temp variable would gain the same value as was previously returned by UTF8Encode() – and that it was residing on the same address as the previous call to SetLength() has allocated for it!
Why this happens remains a complete mystery to me. I've tried numerous workarounds that were supposed to copy the string or increase its refcount but they didn't work. They were:
pascalUniqueString(UTF8EncodeResult);pascalStrings[...] := UTF8EncodeResult[1] + Copy(UTF8EncodeResult, 2, Length(UTF8EncodeResult));pascalMove(UTF8EncodeResult[1], Strings[...][1], Length(UTF8EncodeResult));Outcome remained the same: next call to UTF8Encode() would affect previously written string.
This all would not be so strange if only I hadn't faced the same problem several years ago. I don't remember how I'd solved it (probably I've worked around it somehow too) but I remember that I wasn't understanding a thing the compiler was doing. This time it was similar.
After a hour or two I have finally decided to replace PChar in my Strings[] array declaration with String – and it magically worked!
Comments