Wednesday, March 09, 2011

Insomni'hack GPGPU reversing

One of the reversing challenges was exotic: we were given a ciphertext (ohv'c~f3ehnw4byzzky), a GPGPU file (kernel.bin) of the encryption routine, and the Instruction Set Architecture (ISA) of this GPU.

Since I have never played with GPGPUs (code/assembly/whatever), it seemed hard at first glance. But actually not! Open kernel.bin and see that there's some kind of disassembly. Good! And pretty small, roughly 80 lines of code.

I will explain how I reversed it and reimplemented it in Python to have the same encryption routine, discovered that it was also the decryption routine, and just run it against the ciphertext to obtain the plaintext.

Registers & variables

It first defines some registers:
.reg .b32  r<126>; /* define r0..125 */ 
.reg .b64  x<126>; /* define r0..125 */ 
.reg .b32  f<128>; /* define f0..127 */ 
.reg .pred p<32>; /* define p0..31 */ 
.reg .u32  sp;
And variables:
.reg .b8   wb0,wb1,wb2,wb3; /* 8-bit write buffer */ 
.reg .b16  ws0,wc1,ws2,ws3; /* 16-bit write buffer */ 
.reg .b32  tb0,tb1,tb2,tb3; /* read tex buffer */ 
.reg .b64  vl0,vl1; /* 64-bit vector buffer */ 
.reg .b16  cvt17_0,cvt16_1; /* tmps for conversions */ 
No need to translate this in Python.

Stack & parameters

It defines a 16-bit (2 bytes) aligned stack of 8-bit (1 byte) values.
.local .align 16 .b8 decode_stack[28];
Then it defines the 3 parameters of the routine: length and input/output strings.
.entry decode(
    .param.b32 length  /* length */,
    .param.b32 input  /* input */,
    .param.b32 output  /* output */
)
So let's just build a stack of 28 integers values as a dictionary then length as an integer, input and output as lists of integers (representing characters):
stack = {}
for i in range(28):
    stack[i] = 0

input = [ord(c) for c in "ohv'c~f3ehnw4byzzky"]
output = [0]*len(input)
length = len(input)

Initialization

Then the code begins. Instructions are self-explanatory:
mov.u32 sp, decode_stack;
Here, stack pointer goes to the previously defined stack. Let's be dumb and copy that in Python:
sp = stack
Then:
mov.u32 r0, 22;
st.local.u8 [sp+4], r0;
[...]
st.local.u8 [sp+22], r2;
This translates easily into:
r0 = 22
sp[4] = r0
[...]
sp[22] = r2
That's all for the initialization of the decryption routine.


Loop begins?

From here, it looks like r0 will be used as an increment variable:
mov.u32 r0, 0;
Then it ends the stack with a 0 and sets r1, r2 and r3 to output, input and length:
st.local.u8 [sp+23], r0;
ld.param.u32 r1, [output + 0];
ld.param.u32 r2, [input + 0];
ld.param.u32 r3, [length + 0];
This can be translated in python with:
sp[23] = 0
r1 = output
r2 = input
r3 = length

Indeed this is a loop

Then we see some kind of label, comparing r0 (increment variable) to r3 (length), and we can assume that when it's equal it branches to p0 which then returns:
LBB1_1:
setp.hs.u32 p0, r0, r3;
@p0 ret;
At the end of the code we also see that it increments r0 and goes back to this label:
add.u32 r0, r0, 1;
bra LBB1_1;
Looks like a loop to me! What if we simply turn this into:
for r0 in range(r3):
  # and loop code here

Decryption

Inside the loop, we first read:
add.u32 r4, 4, sp;
add.u32 r4, r4, r0;
ld.local.u8 r4, [r4+0];
r4 is set as sp+4+r0 and then to the value loaded at this address. It is equivalent to:
r4 = sp[4+r0]
Then we read:
add.u32 r5, r2, r0;
ld.global.u8 r5, [r5+0];
Same thing, it is simply loading r2 (input):
r5 = r2[r0]
Oh oh, the "encryption":
xor.b32 r4, r5, r4;
Just a XOR (between stack and input):
r4 = r5 ^ r4
And finally:
add.u32 r5, r1, r0;
st.global.u8 [r5+0], r4;
Saving result in r1 (output):
r1[r0] = r4

Conclusion

It was just a XOR with a hard-coded key saved in the so-called stack, so the decryption routine is the same as the encryption one. We can just use our python implementation and then print the output list of integers (representing characters) as a string with:
print "Output: %r" % "".join(chr(output[i]) for i in range(len(output)))
And it finally gives (reverse.py):
$ python reverse.py
Output: 'you got some skillz'
Not difficult but definitely exotic, I liked it! Thanks goes to milkmix (@milkmix_), creator of this reversing challenge, who also made Windows and Linux ones.

No comments:

Post a Comment