This fragment is about to be reported (you'll remain on this page):

You can enter a comment to clarify the mistake if you would like to:

Part 3 – finding Clues and The Case

How to find our way in megabytes of asm code? And how do games actually execute scripts? Answers are here!

The files

  1. ScenarioRunner – our subject.
  2. WinAPI help files (WIN32.HLP is the main file) – desk reference of a Win32 hacker.
  3. Intel Opcodes & Assembler Instructions – help docs.

Download the tutorial


Did you call us? We are – the Imported Ones!

Strings are beacons but there's also another thing – imported functions. Imported functions also connect us to the program source code, although in a bit more subtle way than strings because we don't exactly see them on the screen but rather feel them being used somewhere in the core hehe http://proger.i-forge.net/Hacking/Stage Once/img/bsd.png
Tables of imported functions are number one target for exe protectors – they implement some tricks so debuggers and disasms like IDA and Olly won't see those functions… without extra effort at least.
This table is simply an array of DWords – Pointers to each function's first instruction – thus, target for CALL (in rare cases JMPs can be used in place for CALLs – this is usually the behaviour of Delphi's compiler).

For example, a program draws something on screen – some text. And the text that this program outputs just doesn't look good when used in another language – particularly, this is often an issue with Japanese games which use monospaced square fonts – for Western languages they look unnatural at best.
So we want to replace the standard font it uses. We know that there's an API function CreateFont() which among other things accepts the name of font to create. We search for it, fix the name – and voila! The game displays neat font for our language.

Another bit of info regarding functions. On Windows there are two versions of almost each system function: ending on «A» and on «W» (e.g. TextOutA and TextOutW). «A» stands for ANSI while «W» stands for Unicode (also called «Wide» because each symbol takes up 2 bytes instead of 1). In NT 5.0+ all functions ending on «A» AFAIK are just wrappers for «W» since the OS core operates solely on Unicode.
Functions that don't have any suffix after it are used in VS header files to easily switch between A/W versions by defining a directive like UNICODE. In system DLLs such functions don't exist.

Let's get more specific to our problem. We need to find a function that is the interpretator's loop – since general idea of a script interpreter is to have a loop which reads instructions from a script, have some switch..case block (or in Delphi this is just case) and… well, we'll see what's next once we find that.
At least we'll know what functions does the engine support and what their opcodes are (in a few words, opcode is an index of a function which that interpreter recognizes and eventually «interprets»).

To give you an idea here's a sample interpreter's loop written in PHP (as a compromise between C++ and Delphi :P):

PHP
function RunScript($script) {
  
$pos 0;
  while (
strlen($script) > $pos) {
    switch (
$script[$pos++]) {
    case 
0:
      
WriteConsole('A message: 'ReadStrFrom($script$pos));
      break;
    case 
1:
      
$varName ReadStrFrom($script$pos);
      
SetVar($varNameReadConsole());
      break;
    case 
2:
      
$scriptName ReadStrFrom($script$pos);
      
RunScriptNamed($scriptName);
      break;
    ...
    default:
      throw new 
Exception('Unknown command opcode '.ord($script[$pos]));
    }
  }
}

Of course, better code would use named constants instead of magic numbers (opcodes), encapsulated script traversing and so on but the above code is just an example anyway.

I suggest this approach: we find calls to ReadFile, set BPs on them and watch something to happen.

Open Imports tab (Open | Subviews | Imports) and type on a keyboard first chars of the function name (if you press F1 you'll get some help on IDA's lists, they have a few handy features like searching by Alt/Ctrl+T)… Aha, gotcha.
Let's press EnterIDA has transferred us to that function's record in the import table – but since we need code that refers to it, not that record let's press Ctrl+X – great, here's… only one place? That's strange but it's Delphi after all and I mostly deal with C* programs.

Go to that part. Ah, we see some kind of a wrapper function: asmJMP DS:__imp_ReadFile. We need some real code so let's find what refers to this function, then – Ctrl+X. We got another two matches, great. Go to both and set BPs there.

For some reason there's also a second entry in the imports table for ReadFile – I dunno why there are both of them, maybe it's some compiler trick but that's not a problem – set a BP there too.

By the way, one wonderful feature in IDA is its Graph view. It doesn't work for first-level «functions» (for the reason they are not functions) but most of functions are subroutines so it will work most of the time. If you're not in Graph view yet try pressing Space – if IDA complains just keep on trying each time you enter a new function and eventually it'll show up a bunch of nice code blocks.

Now I have 3 BPs set. Let's roll! F9. Catch! Let's review what we need to find once again: it should somehow hint on connection with runme.dat. Let's see which arguments ReadFile has… file handle, buffer and bytes to read are those beacons.
File handle is, sure, the best connection we can have since it's an unique ID of any file but to match it to a real file on disk we'll need to know the ID of that real file – it's returned by CreateFile and is different each time. We can set BPs on CreateFile calls, note when it is called with lpFileName = '...\runme.dat' and note down somewhere the file handle it returns (btw, do you remember that functions usually return their result in EAX register?).

OllyDbg has very handy Handles window for this kind of things – there you can see all handles opened by the program (as numeric values) and their string representations – and not only for CreateFile but for other functions too. However, we're in IDA now so let's carry on with it.

However, this is complex and we're either lazy (lazy programmers, duh) or we just want to avoid doing more steps than necessary (since that's something we can always make up for). At first we'll examine other signs – we have bytes to read argument left, and also buffer. Well, let's try bytes to read.

Let's see, our first catch has hFile = 0x4C (doesn't tell much) and nNumberOfBytesToRead = 0x0142. Let's hit Shift+/ and open IDA's calculator, which can also be used for base conversion (although 010 editor's tool is more convenient to use). Let's enter 0x0142 = 322 in decimal, which I suspect… Eureka! Check the size of runme.dat – it's exactly 322 bytes. How handy, the program reads the whole file into the buffer (so it seems).

Well, to tell the truth, you'll rarely get such a coincidence – at least not on the first call to ReadFile, because files usually have some kind of header and signature, which are a dozen of bytes in size. But if you keep on watching ReadFile calls you may eventually find that a program reads some large chunk of data – and if you compare it with size of some file you might see that it differs only in a few tens of bytes.

We got lucky, our test subject is naive and goes straight into our arms reading that whole script into the memory http://proger.i-forge.net/Hacking/Stage Once/img/bsd.png

Now it's time to track down what it's gonna do with all that data it has just read.

The exe is paused before ReadFile call – let it fly by F8 but before you do that open the register that's PUSHed as lpBuffer in a new window of IDA – right-click on the register name and use the context menu or just set the cursor on it and hit Ctrl+Enter (if you press just Enter it will open it in the same code window). You can go back any time by using Esc.

Now we're looking at the memory area that will have the contents of the file read into it. Let's finally press F8 – we'll instantly see that that area is now filled with data – our precious scenario bytecode. It's just about time we use hardware breakpoints.

Hardware breakpoints allow us to set BPs without modifying the program's code (even in memory). Since normal BPs are set by writing asm instruction INT 03 before a command to break on they can in some cases be real break points since the program doesn't execute this code, it only reads from that location (such as a string in memory – it doesn't run string's byte values as an asm code, it only accesses them).
Hardware BPs are CPU's prerogative and a number of BPs you can set at once depends on your CPU type. Generally you can count on at least 4 hardBPs – usually even more, near 8.
Still, 4 are usually enough since you can put normal BPs on executable code without any limits.
Also note that unlike softBPs hardBPs are triggered after the trapped statement or inside it (if it's a complex command like REPE MOV* – in simple MOV or CMP they're triggered after the instruction).

Don't forget to delete BPs that were set in memory locations after the debugged program terminates (unless it's global app memory but that's not important in our simple case) – since memory addresses usually change each time program starts you'll need to re-set all memory BPs each time you start it from the debugger.

Go to that buffer window you've previously opened and set the cursor somewhere inside that lpBuffer (or on its first byte), click F2IDA will likely check hardware BP flag for you already. Mode should be Read.
We don't need anything anymore from the BPs set on ReadFile calls so you can remove them – although I suggest simply disabling them (from their context menu) so you can get back to them quickly if necessary. You can open Breakpoints tab by Ctrl+Alt+B or by Debugger | Breakpoints | Breakpoint list menu command.

Now press F9 and wait until something happens… Here we go – «Hardware breakpoint... has been triggered». That's great, let's see what we've got here…

REPE MOVSD. Well, that looks scary but it's simply an asm instruction that copies a block of memory from one location to another. If you want some info, e.g. how many and to where does this instruction copy open up the docs I've uploaded (N-Z.pdf – that's Intel's manual) and search for that instruction there (I use simple FoxitReader's Search since the links in PDF's contents only work for A-M.pdf).

We can now undertake a challenge of setting BPs on every REPE instruction we get unless we hit something useful or run out of hardBPs but we'll go another route – let's just press (or better hold) F8 unless we find something of interest. This way (holding F8) we'll go up the call tree towards the root because we won't go into new functions (we're not holding F7) but will return from all subroutines gradually. On the road we need to keep our eyes open so we can catch something of value.

…after nearly five functions I got tired of this so I decided to press F9 again – maybe we'll find something faster in another part. IDA has shown me the same function again, just another branch. Okay, let's be more patient this time…

After a dozen of returns I stumble upon some Graph which looks like a case statement.

http://proger.i-forge.net/Hacking/Stage Once/img/ida-case-graph.png

See those boxes going from one root and then joining together on the bottom? If you think about it, that's exactly how a case statement can be visualized.

IDA has even identified it for us (Olly can do this too, although not like these neat colorful blocks) by putting comments like «switch jump» all around the disassembled code.

Can it be the interpreter's loop we're looking for?

Let's look at the comments and strings we have in this function. Hmm…

You can use Ctrl+Wheel Up/Down to zoom, «1» for 100% zoom and W to fit-window zoom.

Well, so far the code doesn't tell me much about its purpose. One thing that looks interesting for me is a referenced string that says «opcode %.2x» – btw, let's take a note where it is used and why we don't see it in the console output.

I've got an idea, I'll disable all BPs for now and set one in the beginning of this func… no, rather at the case's beginning – like IDA says, it's here:

asmJMP     off_41374A[EAX*4] ; switch jump

A word on case statement mechanics and why it can't accept strings as keys in compilable languages.
You might think that case statement is exactly the same for computer as a series of IF statements is – and thus you might wonder why a compiler says that it can't take a string as a case variable. I also thought case was the same as if+if+if+... but for compilers it's totally different. Each case label is actually an index in a «jump table». Such a table is simply an array of addresses. Since a case statement accepts integer values, you can use those values as indexes for that array-of-addresses – thus there's no need to compare anything more than one time, no need to compare anything at all – just call asmJMP caseArray[caseValue] – and you're done!
In the above code fragment, for example, off_41374A is nothing else than base address of that array – and EAX is caseValue, which should be multiplied by 4 because every address in 32-bit CPUs is also DWord – in another words, 4 bytes in size.

So we put a BP there and hit F9. Let's look at EAX register (you can either look at it in General registers window or wait until IDA shows you a hint when you put a mouse over EAX).

Once IDA has shown you that memory hint you can use Wheel Up/Down to show more/less lines.
You can also hover over many other places – like values in General registers tab.
Also, be sure not to hover on off_41374A[EAX*4] statement – only on the separate string «EAX» (somewhere in the code above or below) – otherwise IDA will calculate address (off_41374A + EAX * 4) and show hint for that location rather than the value of EAX.

IDA shows that EAX = 0x01. This doesn't tell us anything, probably yet. Let's roll back a little and review how EAX gets this value. What we see is (try to guess what it does before reading on):

asmCODE:00413735 CALL    sub_413AA0
CODE:0041373A XOR     EAX, EAX
CODE:0041373C MOV     AL, BL
CODE:0041373E CMP     EAX, 6           ; switch 7 cases
CODE:00413741 JA      short loc_4137B5 ; default

Firstly, we clear EAX by XOR, then we set its lower part (AL) to some value of BL (as you remember BL is a low-word of 16-bit register BX which is itself a part of 32-bit register EBX). We need to track how BL is set.
But before we go further let's rename the function we're in now – I named it «$InterpretInstruction».

…And now let's take a break and breathe in deeply a few times.

What do we need to do? I mean, what's our goal with this program? We want to change script lines and as we do this it crashes. So we need to (or must, if we're pressed by the editor) find why.
To think about it, I've almost rushed into finding which function actually sets that instruction byte into the place – but that's not necessary to know. One thing we really need to know is what to do with that code, not to find which one of those zillion disassembled functions picks that code from bytecode stream.
Interestingly, we might eventually find that function on our way but doing so now is not required and will be a waste of time. That's IMO.

So we'll skip to the next part. Let's assume that we have found the interpreter's case statement. We can verify it in a few ways but since it's a tutorial I'll show you how Olly can help us with its great BP logging functions.

Let's turn the page of our enlightenment to the new level now…

Part 4 – getting to the Crash Point »

Comments RSS20

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »

Avatar

29 April 2012

Follow thread #9

Dedication

Im back finally! Hey, thanks for fixing the PHP error, i thought i'dnever will see this place again!

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

29 April 2012

Follow thread #9.1

Proger_XP

No problem, feel free to comment :)

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

30 April 2012

Follow thread #9.1.4

Proger_XP

Then you can imagine what it feels like after you complete «Part 5» when hacking a real game :)

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

30 April 2012

Follow thread #9.1.3

Dedication

I probably had the same feeling when I completed Part 1 and 2 steps. I see what you mean :D!

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

30 April 2012

Follow thread #9.1.2

Proger_XP

«This particular» – you mean hacking? Because I like riddles. With reverse engineering you have a problem – something that was written by another person (or even a team). And it successfully operates so it's a system that works in one direction. When reversing it you learn how to make it do the, well, the reverse – or if it can't you write your own program that can do this. And it's a great feeling when you've found the key – if we think about the original program as a keyhole.

That's about it :)

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

29 April 2012

Follow thread #9.1.1

Dedication

How did you learn about this? I mean what motivates you for this particular activity (just asking:) )

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

27 March 2012

Follow thread #8

Dedication

Hello! again XD. I wont be working on this because im going on a trip to Europe for two weeks. I hope i can recieve some support from you in the future! 

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

27 March 2012

Follow thread #8.1

Proger_XP

Sure, come back whenever you want. Hope you'll have a great time in Europe! 

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

27 March 2012

Follow thread #8.1.1

Dedication

thanks! 

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

23 March 2012

Follow thread #7

Dedication

Hello again Proger. I've been wondering for 5 days now, but i cannot find the right arguments for ReadFile. Im not sure if im doing it right. Also, setting down beacons got confusing becuase I am not sure if thats the right ReadFile place. Can you post a video about that? Sorry, I am asking a lot, but I get stuck along the way :). 

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

23 March 2012

Follow thread #7.3

Proger_XP

I've added the video – feel free to tell me if something is still unclear. 

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

23 March 2012

Follow thread #7.2

Proger_XP

  1. Sorry, I am asking a lot, but I get stuck along the way :).

Don't worry, if you're asking then I just have to make the guide more clear. I'll see what can I do about the video, I have a few things in mind. 

p.s: if you want to have consistent avatar don't forget to include your e-mail… it should be saved if you have cookies enabled anyway. 

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

25 March 2012

Follow thread #7.2.1

Dedication

ok! Thanks :) 

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

23 March 2012

Follow thread #7.1

Dedication

all of it is confusing! sorta 

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

13 July 2011

Follow thread #6

Sledge

So let's say I'm trying to figure out a routine that is used to decode a VN script file. I've already located the 'CreateFile' function that reads the encoded script file (I know I'm on the right track because the first given argument on the stack points to a null-terminated char pointer with a suspicious file name), can I expect the decoding routine to be executed right after 'CreateFile' has been called? Could it be that even after the file has been read into memory, it still takes some time for the routine I'm looking for to be called?

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

13 July 2011

Follow thread #6.2

Proger_XP

By the way, in my reply to this question I didn't mean you shouldn't try to find the routine that will be working with that scenario buffer (e.g. interpreting opcodes) – I only meant you shouldn't try to find it manually by looking at nearby called functions because it's a waste of time comparing the amount of asm code compilers generate. And you can't even hope to recognize the necessary function once you see it.

So you just try setting breakpoints on imported functions, buffers and other places of interest in attempt to track down the routine you're looking for.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

13 July 2011

Follow thread #6.1

Proger_XP

  1. can I expect the decoding routine to be executed right after 'CreateFile' has been called?

No, almost certainly not. Some games even preload scenario (and other) files and later access them. Some even preprocess them splitting apart and later using some bits of early read files.

Usually you'd go in two directions: from below and from above. From below is going from a file reading function (i,e. buffer with scenario bytecode) and looking for a function that does something to it until you finally reach the interpretation loop.

From above means you go from some API function like GetGlyphOutline or TextOut that somehow accesses scenario data that you couldn't trace normally.

Have you read the tutor to the end, by the way? Or you're using it as a reference for hacking some real VN? I'm just curios, there's nothing wrong in either way.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

13 July 2011

Follow thread #6.1.4

Proger_XP

  1. can I always expect CreateFile to return a handler of the passed file(arg1) and store it on EAX?

Yes, that's always the case. Read up on calling convention called stdcall («standard calling convention» – the only one used by all WinAPI functions) – for example, on Wikipedia:

Registers EAX, ECX, and EDX are designated for use within the function. Return values are stored in the EAX register.

So if the function fails CreateFile returns either 0 or 0xFFFFFFFF (INVALID_HANDLE_VALUE). All other values in EAX are the file's handles which the app will always pass to ReadFile later to read from that specific file.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

13 July 2011

Follow thread #6.1.3

Proger_XP

  1. 'll only move to part 4 once these imported functions become crystal clear to me.

I see, I support your decision.

About irrational opcodes – that's exactly the kind of feeling I was getting at first too, until, as I've written in the into, some day I just thought «Hey, so this is that assembler commands? It seems pretty basic now, eh?». Hopefully you'll get that feeling one day as well as I did :)

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

13 July 2011

btw, another question, can I always expect CreateFile to return a handler of the passed file(arg1) and store it on EAX? So let's say I step over (F8) on a 'CreateFile' Call, and then EAX becomes 0×50. So, everytime I see a readfile that has 0×50 as the first argument, can I be sure that this readfile is refering to THAT file?

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

13 July 2011

I see. No I haven't read it to the end YET, I've read up until part 3, I'll only move to part 4 once these imported functions become crystal clear to me. My goal is to be able to reverse engineer VN. By the way, I appreciate you answering my questions here, so far your documents have become a great help for me on understanding those opcodes that once seemed to be completely irrational before.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

12 July 2011

Follow thread #5

Sledge

So, when setting up a BP on a ReadFile function, if I jump to the address memory on [ESP+4] (second argument on readfile which is lpBuffer, considering the return address hasnt yet been pushed, since the CALL is about to be executed) will I be at the exact location of memory where hFile will be written? Does this work for all cases where ReadFile is called?

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

12 July 2011

Follow thread #5.2

Proger_XP

  1. Does this work for all cases where ReadFile is called?

Forgot to answer this one. The above scheme works for every function being called because that's how stack works and usually functions accept arguments via stack (at least that's always the case with WinAPI calls but different compilers can optimize internal application functions by using registers (CPU's & FPU's) to pass arguments as it's more efficient). However, that's rarely the case with C++.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

12 July 2011

Follow thread #5.1

Proger_XP

Yes, if you're at the CALL instruction but it hasn't been yet executed [ESP+4] would contain lpBuffer address which you can follow in a new window. After you press F8 and ReadFile has ran you'll instantly see how that buffer has been filled with data; prior to that it's usually junk.

I'll illustrate this. Go to Debugger → Debugger windows → Stack view while debugging; IDA will open up a new tab. Right-click in there and Jump to ESP – a line will be highlighted – this is [ESP+0] (hFile). Below it is another chunk of 4 bytes (DWord) – [ESP+4] (lpBuffer); further below are [ESP+8], [ESP+0Bh] and so on.

However, myself I rarely use the stack view directly – it's more convenient to use code view. For example, while the program is paused on CALL ReadFile you see the following lines before the CALL:

mov     eax, [ebx+14h]
push    eax             ; lpBuffer
mov     eax, [ebx]
push    eax             ; hFile

As you can see, lpBuffer was in EAX but was overwritten by hFile so now we can't navigate to the buffer using EAX. However, the same value is contained in %% [ebx+14h], as theMOVtells us. So just double-click somewhere inside %% [ebx+14h] and IDA will navigate to that memory location.

But – note that [ebx+14h] is not the buffer itself but a pointer to buffer (that's why the argument is called lpBuffer«p» for «pointer»). Right now IDA shows memory at that location as a sequence of bytes so if you press letter «D» two times it'll convert that block to a DWord that will now say something like

dd offset unk_416194

instead of

db 94h
db  61h ; a
db  41h ; A
db    0

Now double-click on that «unk_XXX» and you'll finally reach the buffer. You can alternatively put the cursor on it and press Ctrl+Enter or right-click and select Jump in a new window to open that block in a new tab.

Usually this method is more convenient than using Stack view and keeping stack offsets in mind.

Even if the above sounds complicated don't be afraid to try – it only looks like that in words. When you've done that several times you'll remember the process and it'll be a peace of cake.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

7 July 2011

Follow thread #4

Sledge

Another question: in part 2, you said: "CALL can be thought of as a shortcut for PUSH EIP; JMP addr, RET – as POP EIP. " So what if the stack is pushed with data before returning to the pushed EIP? How will the program know what offset to return to since the stored EIP has been pushed down and is not at the top of the stack anymore?

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

7 July 2011

Follow thread #4.1

Proger_XP

That's the point, see my reply – functions must always make sure stack is the same as was before they were called – otherwise even a shift by 1 byte in ESP's value will crash the whole program (if not doing something less obvious). So when a function is called and when execution continues after that CALL the value of ESP must be identical.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

7 July 2011

Follow thread #3

Sledge

I see. So from I can see in your tutorial, all arguments are pushed into the stack, right? How does the program know which offset of the stack contains which argument? Must they all be at the top of the stack?

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

7 July 2011

Follow thread #3.1

Proger_XP

  1. Must they all be at the top of the stack?

How do you imagine that? Functions accept multiple arguments, they can't be all on top of the stack :)

The point is that when compiler builds the EXE it calculates precise stack offsets and determines which parameter is where using them. When the function itself modifies the stack compiler tracks all those changes and shifts stack offsets according to those changes. It's a terrible work but for machine it's not too hard, of course. However, sometimes it's impossible even for compilers so they use a trick called stack frames (see below).

I can't illustrate this mechanism on Delphi programs because Delphi usually uses CPU registries to pass parameters since that's faster than manipulating the stack. But I can explain this in words.

Let's say we have this code:

PUSH EAX   ; let's be it the first arg
CALL AFunc
NOP        ; (exit point 2)
...

AFunc proc near
  ; the func's code
  MOV ECX, [ESP+04h]
  RET 4
AFunc endp

What happens here with ESP is:

  1. PUSH EAX – ESP's position is shifted by 4 since the size of EAX is 32 bits, or 4 bytes. In place of now unoccupied bytes the value of EAX is written. To refer to this newly pushed argument (EAX's value) we could at that time do: ESP+0.
  2. CALL AFunc – since CALL is a JMP that additionally pushes return address onto the stack ESP is again shifted by 4 bytes (EIP register is 32-bit) so in order to access the first argument we do: ESP+4 – and that's exactly what AFunc does:
  3. MOV ECX, [ESP+04h] – refers to that argument.
  4. RET 4 – as I've outlined in the asm intro the number after RET doesn't mean what to return but how many bytes to pop off the stack. In our example the function accepted 1 32-bit argument which needs to be removed from the stack or the program will crash – RET expects the return address to be placed on the stack's top and if we don't remove our arguments it would «return», or «jump», to some unknown location which actually was our argument (this is what buffer overflow attacks exploit, by the way). So RET 4 decreases ESP by 4 bytes, reads return address from the stack further decreasing ESP by 4 and finally sets EIP to that address and execution continues after original CALL (at exit point).

Now let's see a more complex example:

PUSH EAX  ; arg #1
PUSH EBP  ; arg #2
CALL AFunc  ; arg#1 = ESP+4, arg#2 = ESP+0
...

AFunc proc near
  PUSH EAX  ; saving registries this function uses internally.
  PUSH EBP
  MOV EBP, [ESP+0Bh]
  MOV EAX, [ESP+08h]
  POP EBP   ; it's of utter importance values are
  POP EAX   ; restored (popped off) in proper - reverse - order
  RET 8
AFunc endp

I think it is clear now what the function does from the above explanation: EBP will hold argument #1 and EAX – argument #2 (the opposite of what was originally passed).

Also, you'll almost often encounter a construction like this in functions' start:

PUSH EBP
MOV EBP, ESP

…and this at their end:

MOV ESP, EBP
POP EBP
RET ...

This is called stack frame and is used to save the original value of ESP (stack top's address that was when the function was called) so compiler doesn't have to bother about shifting argument offsets after each PUSH instruction it uses and can always refer to arguments and variables (vars are also allocated on the stack but after the function was CALLed, not before like arguments) as:

MOV EAX, [EBP+4]

You can read more about stack frames on the Internet.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

7 July 2011

Follow thread #2

Anonymous

I'm still getting used to IDA, I'll get a hand of it soon, at the moment I'm having a problem: I can't find where the passed arguments for ReadFile are shown. At some point, you said: «Let's see, our first catch has hFile = 0x4C», where exactly did you see that 0x4c was passed as argument to hfile??

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

7 July 2011

Follow thread #2.1

Proger_XP

http://proger.i-forge.net/Hacking/Stage Once/img/ida-func-arg-tip.png

You need to hover the mouse over one of the registries which contains a function's argument – EAX in our case.

p.s: if you're the same person why don't you get yourself some name? :)

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

7 July 2011

Follow thread #1

Anonymous

Are these imported functions part of some standard library from OS or are they functions written by the compiler? Any chance I might bump into other programs that read files without calling/jumping to the mentioned «ReadFile» from imported functions?

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »
Avatar

7 July 2011

Follow thread #1.1

Proger_XP

Well, imported function list shows functions imported from libraries. Since a library is a DLL file it's not necessary an OS DLL – a program might have its own custom DLLs. However, in all the cases reading a file eventually results in a call to WinAPI's ReadFile imported from kernel32.dll unless a program uses its own low-level disk access mechanism which is certainly not a case for visual novels or games for that matter – it's way too troublesome.

In some rare cases you might stumble upon a program that has its logics divided into an EXE and one or more DLLs where EXE merely calls bunch of DLL functions and none of them come from the OS (e.g. instead of ReadFile from kernel32 it might call some function from mylibrary.dll or whatever). However, (1) usually exported/imported functions have names so they might hint you at their purpose, and (2) you can always load a DLL and debug it almost like normal EXE.

That said, yon won't find many VNs that don't invoke OS functions directly and use their own DLLs so don't be bothered about that much yet.

Your name: Your homepage:

Text & signature markup:You can use UverseWiki markup. In short: **bold**, //italic//, %%code%%, ((URL link)), >inline quote, <[ multiline quote ]>.

Humans! Please enter "J" here: (or turn JavaScript on for automatic verification)
Subscribe by e-mail (manage):
Ctrl+Enter »