Debugging story: a crash while mapping a texture

By Gynvael Coldwind | Sun, 16 Jul 2017 00:10:51 +0200 | @domain:
Recently on my Polish livestreams I've been writing a somewhat simple raytracer (see screenshot on the right; source code; test scene by ufukufuk), with the intention of talking a bit on optimization, multithreading, distributed rendering, etc. As expected, there were a multitude of bugs on the way, some more visual than others. My favorite one so far was a mysterious buffer overflow resulting with a C++ exception being thrown when rendering in 4K UHD (3840x2160) but not in 1080p (1920x1080). While trying to find the root cause I also run into a standard C library bug with the sqrt function (though it turned out not to be related in the end), which made the run even more entertaining.
It all started with a thrown C++ exception:

.............terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 9223372036854775808) >= this->size() (which is 845824)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Well, I had to agree that 9223372036854775808 is larger-or-equal to 845824. I immediately knew where the exception was thrown, as there was only one place in the whole code base where I used the at() method of a std::vector which could throw this exception*.
* The difference between using some_vector[i] and is that the prior doesn't do any range checking (at least in popular implementations) while the latter does and throws an exception when i points out of bounds.

V3D Texture::GetColorAt(double u, double v, double distance) const {
(void)distance; // TODO(gynvael): Add mipmaps.

u = fmod(u, 1.0);
v = fmod(v, 1.0);
if (u < 0.0) u += 1.0;
if (v < 0.0) v += 1.0;

// Flip the vertical.
v = 1.0 - v;

double x = u * (double)(width - 1);
double y = v * (double)(height - 1);

size_t base_x = (size_t)x;
size_t base_y = (size_t)y;

size_t coords[4][2] = {
{ base_x,
base_y },
{ base_x + 1 == width ? base_x : base_x + 1,
base_y },
{ base_x,
base_y + 1 == height ? base_y : base_y + 1 },
{ base_x + 1 == width ? base_x : base_x + 1,
base_y + 1 == height ? base_y : base_y + 1 }

V3D c[4];
for (int i = 0; i < 4; i++) {
c[i] =[i][0] + coords[i][1] * width);

I started by running the raytracer with a debugger attached and waited for it to trigger again - it took about 20 minutes to reach the state where the bug was triggered (I felt like I was debugging something that's located on Mars). However it seems that GDB doesn't break by default on C++ exceptions (TIL: you have to issue the catch throw command first):

...........terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check: __n (which is 9223372036854775808) >= this->size() (which is 845824)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
[Thread 26080.0x7b58 exited with code 3]
[Thread 26080.0x7024 exited with code 3]
[Thread 26080.0x8648 exited with code 3]
[Thread 26080.0x84fc exited with code 3]
[Thread 26080.0x81e8 exited with code 3]
[Thread 26080.0x525c exited with code 3]
[Thread 26080.0x53a8 exited with code 3]
[Thread 26080.0x87d0 exited with code 3]
[Thread 26080.0x7de0 exited with code 3]
[Thread 26080.0x41b0 exited with code 3]
[Thread 26080.0x63e4 exited with code 3]
[Thread 26080.0x7e10 exited with code 3]
[Thread 26080.0x6c28 exited with code 3]
[Thread 26080.0x84dc exited with code 3]
[Thread 26080.0x7044 exited with code 3]
[Thread 26080.0x816c exited with code 3]
[Inferior 1 (process 26080) exited with code 03]
(gdb) where
No stack.

I decided to change the strategy and replace the at() method with the operator[] so that the C++ exception is not thrown, but instead a Windows exception (ACCESS_VIOLATION) is raised (which are caught by default by the debugger). A quiet voice in my head told me "that won't work if the value multiplied by the size of the element will overflow and fall within the range" - and sure enough, after 40 minutes of rendering I got a 4K image without any crashes. The reason was that the actual index - 9223372036854775808 (0x8000000000000000 hexadecimal) multiplied by the size of the vector element (12 bytes) overflows into the value 0, and index 0 is perfectly valid.

The next strategy change involved actually catching the exception in C++ (as in a try-catch block) and doing lots of debug prints on such an event. This (after another 20 minutes) yielded good results:

.....u v: nan nan
x y: nan nan
base x y: 9223372036854775808 9223372036854775808
0: 9223372036854775808 9223372036854775808
1: 9223372036854775809 9223372036854775808
2: 9223372036854775808 9223372036854775809
3: 9223372036854775809 9223372036854775809
w h: 826 1024

The problem seemed to be that u and v were NaN, which is a special IEEE-754 floating point value that has the habit of being propagated all around once some expression yields it as a result, i.e. any operation on NaN will yield NaN, with a small exception of sign change which produces -NaN.

One more exception is actually casting it to integers. The C++ standard is pretty clear here - it's undefined behavior. However, in low-level land there actually needs to be an effect and usually there is an explanation for that effect. I started looking into it by running a simple snippet like cout << (size_t)NAN which produced the value 0 (and not the expected 0x8000000000000000). After doing some more experiments and reading in Intel Manual (Indefinites and Results of Operations with NaN Operands or a NaN Result for SSE/SSE2/SSE3 Numeric Instructions sections) I figured out that the x86 assembly instructions themselves (both fist/fistp from the FPU group, as well as cvttsd2si/vcvttsd2si from the SSE/AVX group) return 0x8000000000000000 (called the indefinite value) in such case (a more general rule is: all bits apart from the most significant one are cleared); the #IA fault might be thrown as well, but it's almost always masked out. That said, since technically this is an UB, the compiler can do whatever it feels like and e.g. put a 0 as the result of a (size_t)NAN cast during compilation time. To quote our IRC bot (courtesy of KrzaQ):

<@Gynvael> cxx: { double x = NAN; cout << (size_t)x; }
<+cxx> 9223372036854775808
<Gynvael> cxx: { cout << (size_t)NAN; }
<+cxx> 0

Back to the original problem. Looking at the code, u and v actually come from outside of the function and are modified only in the first lines of the function:

V3D Texture::GetColorAt(double u, double v, double distance) const {
u = fmod(u, 1.0);
v = fmod(v, 1.0);
if (u < 0.0) u += 1.0;
if (v < 0.0) v += 1.0;

The fmod function doesn't seem to ever return NaN, so u and v arrived at the function already containing the NaN value. The function itself is called from the main shader and the interesting values are the result of a call to the V3D Triangle::GetUVW(const V3D& point) function. The function itself uses the barycentric interpolation method to calculate the UVW texel coordinates given three points making a triangle, UVW values for each of them and an "intersection" point on the triangle that the interpolated value is calculated for. It looks like this:

V3D Triangle::GetUVW(const V3D& point) const {
V3D::basetype a = vertex[0].Distance(vertex[1]);
V3D::basetype b = vertex[1].Distance(vertex[2]);
V3D::basetype c = vertex[2].Distance(vertex[0]);

V3D::basetype p0 = point.Distance(vertex[0]);
V3D::basetype p1 = point.Distance(vertex[1]);
V3D::basetype p2 = point.Distance(vertex[2]);

V3D::basetype n0 = AreaOfTriangle(b, p2, p1);
V3D::basetype n1 = AreaOfTriangle(c, p0, p2);
V3D::basetype n2 = AreaOfTriangle(a, p1, p0);

V3D::basetype n = n0 + n1 + n2;

return (uvw[0] * n0 + uvw[1] * n1 + uvw[2] * n2) / n;

The barycentric interpolation method itself is actually pretty simple - the final result is the weighted average of the UVWs of all three points making the triangle, where the weight of the value is actually the area of the triangle made by the provided "intersection" point and the two triangle points that are not the point the weight is calculated for (see the slides liked above for a better explaination).

After adding some more debug prints and waiting another 20 minutes it turned out that the AreaOfTriangle returned the NaN value in some cases when the triangle in question was actually a line (i.e. one of the three points that make the triangle was located exactly on the edge between two other points). This led me to the AreaOfTriangle function itself:

static V3D::basetype AreaOfTriangle(
V3D::basetype a, V3D::basetype b, V3D::basetype c) {
V3D::basetype p = (a + b + c) / 2.0;
V3D::basetype area_sqr = p * (p - a) * (p - b) * (p - c);

return sqrt(area_sqr);

The function is rather simple - it uses the Heron's Formula to calculate the area given the length of all three edges of the triangle, by calculating half of the perimeter (variable p in my code), then multiplying with the difference between each length of the edge and p and then calculating the square root out of the result. The last bit is what caught my attention - as far as I knew sqrt() function would actually return NaN if a negative value was fed to it.

I started by verifying both in the documentation and experimentally for the value -1 (no, these are not complex numbers, it's not i). The said "a domain error occurs, an implementation-defined value is returned (NaN where supported)", stopped at "a domain error occurs", MSDN mentioned "returns an indefinite NaN" and man for glibc agreed. The experimental part had actually much weirder results: on Ubuntu sqrt(-1) yielded the value -NaN (well, I guess -NaN is still technically a NaN), but on Windows I've got -1 (wait, what?). I tried on Windows with a couple other values and it turned out that sqrt is just returning the negatives as I pass them. Given that MSDN claimed that a NaN will be returned I launched IDA and found that the mingw-w64 GCC 7.0.1 compiler I'm using actually has it's own implementation of the sqrt function, which happened to suffer from a regression in the recent months. Oh well.

Wait. Why did I actually get NaN in my raytracer if there is such a bug? It turned out that I'm actually using -O3 when compiling my app and using -O3 causes a sqrt to actually work correctly (probably due to some random compiler optimizations) and return NaN (a positive one for a change).

This left the question of "why is the value passed to sqrt negative?". The answer is the one you would expect: floating point inaccuracies. I printed the exact (somewhat - i.e. in decimal form) lengths of the triangle edges that were passed to the function upon crash:


Given that one of the points lies on the edge between the two other, a should be exactly equal to b+c, however that wasn't the case:

a =66.2214810193791976189459091983735561370849609375
b+c =66.2214810193791834080911939963698387145996093750

This also means that one of the brackets in the area_sqr calculation will probably yield a negative value (even though 0 would be the correct result):

p =66.22148101937918340809119399636983871459960937500

After the multiplications and the sqrt call we arrive at the infamous NaN, which solves the whole riddle.

The fix was rather easy - since I know that there is no such thing as "negative area", I can just check if area_sqr is negative and correct it to 0.0 in such a case:

// It seems that due to floating point inaccuracies it's possible to get a
// negative result here when we are dealing with a triangle having all points
// on the same line (i.e. with a zero size area).
if (area_sqr < 0.0) {
return 0.0;

To sum up, the invalid index calculation was caused by float inaccuracies earlier on in the calculations, which caused an area of triangle to be a square root of a negative number. I must admit I really had fun debugging this bug, especially that I encountered both the interesting NaN-to-integer cast scenario, as well as the sqrt mingw-w64 bug on the way.

And that's about it.

Blind ROP livestream followup

By Gynvael Coldwind | Thu, 13 Jul 2017 00:10:50 +0200 | @domain:
Yesterday I've done a livestream during which I tried to learn the Blind Return Oriented Programming technique. Well, that isn't fully accurate - I already read the paper and had one attempt the evening before, so I knew that the ideas I wanted to try live should work. And they did, but only in part - I was able to find a signaling gadget (i.e. a gadget that sent some data to the socket) which could be used later on, as well as a "pop rsi; ret"-equivalent gadget (using a different method than described in the paper - it was partly accidental and heavily challenge specific). But throughout the rest of the livestream I was able to find neither a puts function nor use the gadget I already had to get a "pop rdi; ret"-equivalent gadget. The bug, as expected, was trivial.

Long story short (a more verbose version follows), my mistake was in the following code:

if err == "CRASH" and len(ret) > 0:
return (err, ret)

The correct form is:

if err == "DISCONNECTED" and len(ret) > 0:
return (err, ret)

And that's it. If you were on the YT chat or IRC you probably saw me "facepalming" (is that even a word? it should be) and pointing out the bug even before the stream went offline (i.e. during the mission screen). In the next 5 minutes I had both gadgets I needed, so that was the only bug there. Oh well.

Full version of the story:
I'll start by saying that if you don't know what ROP is (Return Oriented Programming, which is actually an exploitation technique), you might want to start there as BROP (or Blind ROP) is a technique that heavily builds on top of ROP. Here are some random links about ROP:

The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86) (Hovav Shacham)
Return-Oriented Programming: Exploits Without Code Injection
Doing ret2libc with a Buffer Overflow because of restricted return pointer - bin 0x0F (LiveOverflow)
ROP with a very small stack - 32C3CTF teufel (pwnable 200) (LiveOverflow)
And also:
Return-oriented exploiting
Hacking Livestream #20: Return-oriented Programming

There were some follow-up papers and techniques , e.g. Return-Oriented Programming without Returns, but the above should be enough to get you started.

One of such follow-ups was the Hacking Blind paper by Andrea Bittau, Adam Belay, Ali Mashtizadeh, David Mazieres and Dan Boneh from Stanford University - a really cool piece of research that showed that you don't really need the binary to find gadgets. I highly recommend reading the paper, but as a summary I'll note that it boils down to first locating the binary in memory (by basically reading the stack using e.g. Ben Hawkes' byte-by-byte brute force technique also described by pi3 in Phrack), then finding a gadget which has an observable effect (e.g. it hangs the connection or sends some additional data through the socket) and finally finding gadgets using pretty clever heuristics that boil down to observing one of three signals (a crash, the normal behavior, or the aforementioned observable effect). So yeah, it's the blind SQLI of ROP ;).

The original paper showcases several optimizations, but I decided to start with the basics (I'm a fan of reinventing the wheel). I prepared a small forking-server binary (an echo server basically) that had PIE and stack cookies disabled (and ASLR was also disabled system-wide), compiled it statically and run on a virtual machine.

Both the binary and the source code created before and after the stream are available on my github (the binary listens on port 1337; please note that when learning blind ROP you should not look at the binary in IDA/etc and pretend you don't have access to it).

The first step was to look for the bug (though that was trivial in this case as there is only one input length+data pair basically) and the size of the overflowed buffer. This was done by using growing data lengths (from 1 byte upwards) and checking whether the "Done!" message was transmitted via the socket (denoted as "OK" in my code) or whether the connection abruptly stopped before that (denoted as "DISCONNECTED" or "ERROR" in my socket code, or just "CRASH" later on - I mixed this up on the stream thus I couldn't get it to work to the full extent in the end). It turned out that the buffer was 40 bytes long.

The next step was to find a signaling gadget. A signaling gadget (similarly to a signaling behavior in Blind SQLI) can either be some data transmitted over the socket or a connection hang (i.e. the server application entering an infinite loop or hanging on some input which can be detected e.g. using the socket timeout mechanism). During the livestream I found a gadget which did both, i.e. it printed a 16-hexdigit number (so it was probably a printf("%.16x", arg2) function call) and then started reading from the socket (which basically stopped the application and caused a socket timeout on my side).

This gadget was actually really useful, since it being (probably) a printf("%.16x", arg2) call meant that jumping just after the instruction that puts the second argument in the RSI register (64-bit Linux calling convention) would output the current value of the RSI register. This also meant that I could easily find a "pop rsi; ret"-equivalent gadget (I keep writing "-equivalent" as found gadgets might be a lot large and have unpredictable side-effects, but in the end they would e.g. also pop a value from the stack and place it in a given register) just by using the following ROP-chain:

[unknown-scanned address]
[address of the printf("%.16x", rsi) gadget]

The unknown-scanned address would be a sequential-or-random address within the assumed range of the executable section of the binary. Since one needs probably quite a lot of attempts to find such gadget (tens of thousands) I decided (as the paper suggested) to use multiple threads/connections to do it (20 threads worked fine in my case). Part of the output log looked like this (this is actually from the pre-stream version):

0x37b4 PRINT
Data size:
0x37ef PRINT
Data size:
Data size:

Due to multithreading this isn't really readable, but one can observe some non-4141414141414141 values (mainly 00000000006ee3c3) and then in the end the magical 4141414141414141 value right next to the "PRINT 0x38e2" information (which means: the gadget is at image_base+0x38e2 address).

The next step was to find the puts function (followed probably by a crash) that could be used to easily find a "pop rdi; ret"-equivalent gadget (the idea was to do puts(image_base+1) which would echo "ELF" when the correct gadget is found), but for some reason it didn't work (yeah, this is the place where the "CRASH"/"DISCONNECTED" mix up started to be a problem). I then tried to use the print-gadget I had already, but actually jumping 7 or 8 bytes after it to skip setting the RDI register (and using that instead of puts, risking an accidental format tag like %s at the beginning of the binary resulting in a false negative). This didn't work either (though probably because I wasn't patient enough with 8 bytes variant - well, a stream is time limited).

After some more minutes of trying I decided to finish the stream at that point (it was already 1.5h long) and promised to do this blogpost about what went wrong. So yeah, I mixed up the name of the signal from my socket-helper implementation (serves me right for using strings instead of any sane type of enums/constants).

When I spotted the problem I was able to both find the puts gadget I wanted and the "pop rdi; ret"-equivalent gadget within the next few minutes. Having these two what was left to do was actually just dumping the binary as a puts(controlled_rdi) allows one to do it by sequentially reading the memory (and assuming that there is a \0 at the end of every piece of data). It must be noted that it took several hours to dump the binary (as it was over 1MB in size).

Having the binary would bring this to a standard ROP exploitation which wouldn't take much more time (especially if the magic gadget would be found in the binary; I guess one can look for this gadget in BROP as well).

So there you have it :). Again I would like to recommend reading the paper, especially the sections about the "BROP-gadget" and PLT/GOT methods - I really liked them.

And that's it.

Announcing Bochspwn Reloaded and my REcon Montreal 2017 slides

By j00ru | Tue, 20 Jun 2017 16:14:58 +0000 | @domain:
A few days ago at the REcon conference in Montreal, I gave a talk titled Bochspwn Reloaded: Detecting Kernel Memory Disclosure with x86 Emulation and Taint Tracking. During the presentation, I introduced and thoroughly explained the core concept, inner workings and results of my latest research project: a custom full-system instrumentation based on the Bochs […]

Google Capture the Flag 2017 Quals start today

By Gynvael Coldwind | Fri, 16 Jun 2017 00:10:49 +0200 | @domain:
Google Capture the Flag 2017 (Quals) start today at 2am CEST tonight! The format will be pretty standard - teams / jeopardy / 48h / dynamic scoring, with one interesting addition - the submitted write-ups might win rewards ($100 - $500) as well - see the rules for details. In any case, I hope it turns out to be a fun and challenging CTF - I personally have every reason to hope the best, as the people behind the scenes are both amazing and experienced. And hey, there is even one task from me :)

I wish you all best of luck!


Windows Kernel Local Denial-of-Service #5: win32k!NtGdiGetDIBitsInternal (Windows 7-10)

By j00ru | Mon, 24 Apr 2017 09:39:26 +0000 | @domain:
Today I’ll discuss yet another way to bring the Windows operating system down from the context of an unprivileged user, in a 5th and final post in the series. It hardly means that this is the last way to crash the kernel or even the last way that I’m aware of, but covering these bugs indefinitely […]

Windows Kernel Local Denial-of-Service #4: nt!NtAccessCheck and family (Windows 8-10)

By j00ru | Mon, 03 Apr 2017 10:59:46 +0000 | @domain:
After a short break, we’re back with another local Windows kernel DoS. As a quick reminder, this is the fourth post in the series, and links to the previous ones can be found below: Windows Kernel Local Denial-of-Service #3: nt!NtDuplicateToken (Windows 7-8) Windows Kernel Local Denial-of-Service #2: win32k!NtDCompositionBeginFrame (Windows 8-10) Windows Kernel Local Denial-of-Service #1: […]

Next livestream: creating Binary Ninja plugins (by carstein)

By Gynvael Coldwind | Tue, 21 Mar 2017 00:10:43 +0100 | @domain:
Tomorrow (Wednesday, 22nd of March) at 8pm CET on my weekly Hacking Livestream I'll be joined by Michal 'carstein' Melewski to talk about creating plugins for Binary Ninja. Well, actually Michal will talk and show how to do it, and I'll play the role of the show's host. The plan for the episode is the following (kudos to carstein for writing this down):

1. Little intro to Binary Ninja - 5 minutes
2. Working with Binary Ninja API - console and headless processing
- how to use console effectively
- documentation

3. Basics of API
- Binary view
- Functions,
- Basic Blocks
- Instructions,
- Low Level Intermediate Language (infinite tree)

4. Syscall problem,
- first scan in console,
- simple annotation (getting parameter value)
- going back the instruction list
- detecting same block?

5. Future of API and what can you do with it?
- links to presentation (ripr, type confusion)

See you tomorrow!

P.S. We'll have a single personal Binary Ninja license to give away during the livestream, courtesy of Vector 35 folks (thanks!). Details will be revealed on the stream.

My 0CTF 2017 Quals write-ups

By Gynvael Coldwind | Mon, 20 Mar 2017 00:10:42 +0100 | @domain:
During the weekend I played 0CTF 2017 Quals - we finished 15th and therefore sadly didn't qualify. The CTF it self was pretty fun since the tasks had always a non-standard factor in them that forced you to explore new areas of a seemingly well known domain. In the end I solved 4 tasks myself (EasiestPrintf, char, complicated xss and UploadCenter) and put down write-ups for them during breaks I took at the CTF.

*** EasiestPrintf (pwn)
You've got printf(buf) followed by an exit(0), an unknown stack location and non-writable .got - this was was mostly about finding a way to get EIP control (and there were multiple ways to do it).

*** char (shellcoding)
ASCII ROP, i.e. only character codes from the 33-126 range were allowed.

*** Complicated XSS (web)
XSS on a subdomain, mini-JS sandbox and file upload.

*** UploadCenter (pwn)
A controlled mismatch of size passed to mmap and munmap.

I've added my exploits to the write-ups as well.

That's it.

Windows Kernel Local Denial-of-Service #3: nt!NtDuplicateToken (Windows 7-8)

By j00ru | Tue, 07 Mar 2017 15:34:35 +0000 | @domain:
This is the third post in a series about unpatched local Windows Kernel Denial-of-Service bugs. The list of previous posts published so far is as follows: Windows Kernel Local Denial-of-Service #2: win32k!NtDCompositionBeginFrame (Windows 8-10) Windows Kernel Local Denial-of-Service #1: win32k!NtUserThunkedMenuItemInfo (Windows 7-10) As opposed to the two issues discussed before, today’s bug is not in the graphical […]

Windows Kernel Local Denial-of-Service #2: win32k!NtDCompositionBeginFrame (Windows 8-10)

By j00ru | Mon, 27 Feb 2017 14:49:32 +0000 | @domain:
Another week, another way to locally crash the Windows kernel with an unhandled exception in ring-0 code (if you haven’t yet, see last week’s DoS in win32k!NtUserThunkedMenuItemInfo). Today, the bug is in the win32k!NtDCompositionBeginFrame system call handler, whose beginning can be translated into the following C-like pseudo-code: NTSTATUS STDCALL NtDCompositionBeginFrame(HANDLE hDirComp, PINPUT_STRUCTURE lpInput, POUTPUT_STRUCTURE lpOutput) […]

Windows Kernel Local Denial-of-Service #1: win32k!NtUserThunkedMenuItemInfo (Windows 7-10)

By j00ru | Wed, 22 Feb 2017 16:24:23 +0000 | @domain:
Back in 2013, Gynvael and I published the results of our research into discovering so-called double fetch vulnerabilities in operating system kernels, by running them in full software emulation mode inside of an IA-32 emulator called Bochs. The purpose of the emulation (and our custom embedded instrumentation) was to capture detailed information about accesses to user-mode memory […]

Finishing oxfoo1m3 crackme

By Gynvael Coldwind | Mon, 20 Feb 2017 00:10:39 +0100 | @domain:
On the last episode of Hacking Livestream (#10: Medium-hard RE challenge - see below) I've shown how to approach a medium-hard reverse-engineering challenge. The example I used was the oxfoo1m3 challenge found in the "Level5-professional_problem_to_solve" directory of archive (this one), which I picked using such complex criteria as "something that runs on Ubuntu" and "something 32-bit so people with the free version of IDA can open it". As expected (and defensively mentioned several time during the stream), I was not able to complete this challenge during the livestream itself (which is only one hour, and that includes news and updates, and Q&A). However I did finish the task two days ago. It turned out I was close to the goal - took only around 30 minutes of additional work (which makes me wonder if Level5 is actually close to an RE300 challenge; probably it's closer to RE200). Anyway, here is the promised part 2 of the solution.

Note 1: While I'll write down a short recap of the initial steps and discoveries, please take a look at the recording of the episode #10 for details (crackme starts at 15m40s). If you've already seen it, just jump to part 2 in the second half of this post.

Note 2: Since this post is meant to have some education value I'll assume that the readers have only basic knowledge on RE techniques, and therefore I'll try to be verbose on some topics which are most likely well known amongst the more senior folks.

Part 1: Recap

Initial recon (i.e. reconnaissance phase) consisted of a couple of rather simple steps:
  • Obvious things first: just running the binary and seeing how it behaves. Surprisingly we could already derive the possible password length just by looking what was left in the standard input buffer after the crackme stopped (compare the input string to the "asjdhajsdasd" letters seen as the next commands).

  • Running the binary via strace - this actually didn't go too well due to ptrace(PTRACE_TRACEME) protection.

  • Viewing the file in a hex editor - showed quite a lot of "X" characters.

  • Checking the entropy level chart (though at this point it was already known it will be low, as per the hexeditor inspection). You can read more about this entropy-based recon technique in this post from 2009.

  • And finally loading the binary into IDA / Binary Ninja.

Up to this moment we already learned that it's a rather small crackme with no non-trivial encryption (though the entropy test did not rule out trivial encryption - i.e. substitution ciphers like single-byte-key XORs, etc - and such was indeed encountered later on) that actively tries to make debugging and reverse-engineering harder. Not much, but it does give one a broad picture and it usually takes a few minutes tops, which makes it worth it.


We've also found the first anti-debugging trick, which was the aforementioned ptrace(PTRACE_TRACEME) call.

The idea behind it is related to the fact that a process can be debugged (ptrace() is the debugging interface on Linux) at most by one debugging process at a time. By calling ptrace(PTRACE_TRACEME) the callee process tells the kernel that it wants to be debugged by its parent (i.e. that its parent process should be notified of all debugger-related events). This call will result in one of two outcomes:

  • The call succeeds and the parent process becomes the debugger for this process (whether it wants it or not). The consequence of this is that no other debugger can attach to this process after this call.
  • The call fails, ptrace(PTRACE_TRACEME) returns -1 and errno is set to EPERM (i.e. "operation not permitted" error). This happens only* if another debugger is already tracing this process, which means this can be used to detect that someone/something is debugging this process and act accordingly (e.g. exit the process complaining that a debugger is present or misbehave in weird ways).

* - This isn't entirely true; it can also fail if the callee process is created from a SUID binary owned by root and was executed by a non-root user.

So the anti-RE-engineer wins regardless of what ptrace returned.

The usual way to go around this is to find the ptrace(PTRACE_TRACEME) call and NOP it out, though in the past (uhm, during a competition in 2006) I remember doing an ad-hoc kernel module which would fake this function for chosen processes.

Another way is to catch the syscall with a debugger and fake the return value, however in our case this turned out to not be possible since the ELF file was modified in a way that prevented loading it in GDB:

I'll get back to bypassing the anti-GDB trick in part 2 down below.

Simple assembly obfuscation schema

Analyzing the binary in IDA quickly led to the discovery of a simple yet moderately annoying obfuscation schema:

As one can see in screenshot above, between each "real" instruction there is a 30-byte block which basically jumps around and does quite a lot of nothing. The sole purpose of that is making the assembly dead-listing (i.e. the output of a disassembler) less readable.

There are several ways to mitigate this kind of protections:
  • One is to simply ignore it, as I did on the livestream. I figured that since it consists only of 30 byte blocks I can just manually make sure all the "real" instructions are disassembled correctly, and then I can skip the non-meaningful ones.
  • Another idea would be to write a simple script which just changes these instructions to NOPs - this would be pretty easy as it seems the whole block consists always of the same bytes (i.e. the call's argument is relative - therefore constant, the offset to _ret is constant, and everything else also uses always the same immediate values, instructions and registers). This simplifies the dead-listing analysis as only the meaningful instructions are left in the end.
  • Yet another way - tracing + simple filtering - is shown in the second half of this post.

Trivial encryption layer

After finding all of the "real" instructions (there was only a handful of them) it turned out that the code is doing in-memory decryption of another area of the code before jumping there:

LOAD:080480C2 mov esi, offset loc_8048196
LOAD:080480CE mov edi, esi
LOAD:080480EE cld
LOAD:0804810D mov ecx, 0A80h
LOAD:08048135 lodsb
LOAD:08048154 xor al, 0x58; 'X'
LOAD:08048174 stosb
LOAD:08048193 loop loc_8048135

The decryption is actually just XORing with a single-byte key - letter 'X', or 0x58 - which explains both why the entropy was low (as mentioned before, certain types of ciphers - which I personally call "trivial ciphers", though it's not really a correct name - don't increase the entropy of the data; XOR with a single key falls into this category) and why we saw so many X'es in the hex editor (it's common for a binary to have a lot of nul bytes all around, and 0^0x58 gives just 0x58).

Again, there are several ways to go about it:

  • First one, which I've learned during the livestream from one viewers [uhm, I'm sorry, I don't remember your name; let me know and I'll put it here], is using the XOR decrypt function in Binary Ninja (if you're using this interactive disassembler): in hex view select the bytes, then right-click and select Transform → XOR → 0x58 as key.
  • Second would be doing a similar thing in IDA - as far as I know this would require a short Python/IDA script though.
  • And yet another method, which I usually prefer is writing a short Python script that takes the input file, deciphers the proper area and saves it as a new file.

The reason I personally prefer the last method is that it's easy to re-run the patching process with all kind of changes (e.g. switching up manual software breakpoints, commenting out parts of the patches, etc).

In any case, the initial script looked like this (Python 2.7):

# Read original file.
d = bytearray(open("oxfoo1m3", "rb").read())

OFFSET = 0x196 # Offset in the file of encrypted code.
LENGTH = 0xA80 # Length of encrypted code.

# Decipher.
k = bytearray(map(lambda x: chr(x ^ 0x58), d[OFFSET:OFFSET+LENGTH]))
d = d[:OFFSET] + k + d[OFFSET+LENGTH:]

# Change XOR key to 0 (i.e. a XOR no-op).
d[0x154 + 1] = 0

# Write new file.
open("deobfuscated", "wb").write(str(d))

Executing the script yielded in a deciphered executable and the analysis could continue.

The end of the livestream

The last thing I did during the livestream was to look around the code and finally to try to figure out where is that PTRACE_TRACEME protection executed from (so I could no-op it). I tried to do this by injecting the CC byte (which translates to int3, i.e. the software breakpoint; it's also one of the very few x86 opcodes that are worth to remember) at a few spots that looked interesting (one of them was an "int 0" instruction), then run the modified version and analyze the core file it generates (i.e. the structured memory state which is dumped when a process is killed due to certain errors on Linux; courtesy of ulimit -c unlimited setting in Bash, amongst other things). This resulted in a weird crash where EIP was set to 0xf001 (which happens to be crackme author's nickname), what made me think there is some kind of a checksum mechanism in play (turned out to be close, but not accurate).

And at think point the livestream has ended.

Part 2: End game

A few days after the livestream I found some time to get back to the challenge. I decided to continue with a different approach - one of using instruction-grained tracing. But before I could do that I had to deal with a problem of not being able to run the challenge under a debugger (due to the ELF file being modified in a way which GDB disliked).

The method I would use in an environment that has a JIT debugger (i.e. Just-In-Time debugger - a registered application which starts debugging an application when it crashes; it's a Windows thing) would be inserting z CC at program's entry point. This would make the crackme crash at the first instruction and automatically attach the debugger - pretty handy.

On Ubuntu however I had to insert an infinite loop instead (a method also suggested by one of my viewers), i.e. bytes EB FE - another opcode worth writing down and remembering. This was done by adding the following line to my Python patching script:

d[0x80:0x82] = [0xeb, 0xfe] # jmp $
# original bytes: e8 01
# set *(short*)0x08048080=0x01e8

Thanks to this approach, after executing the crackme it basically "hangs" (it's technically running, but always the same infinite-loop instruction) there is time to calmly find the PID of the process, attach the debugger, replace the EB FE bytes with the original ones and start the trace.

As previously, I like to do this with a script (this time a GDB script):

set *(short*)0x08048080=0x01e8
set disassembly-flavor intel
set height 0

break *0x8048195

# Most basic form of tracing possible.
set logging file log.txt
set logging on
while 1
x/1i $eip

The script above can be executed using the following command:
gdb --pid `pgrep deobfuscated` -x script.gdb

The result is a log.txt file containing a runtime trace of all executed instructions from 0x8048195 up until the application crashes.

0x08048195 in ?? ()
=> 0x8048195: ret
0x0804809e in ?? ()
=> 0x804809e: call 0x80480a4
0x080480a4 in ?? ()
=> 0x80480a4: pop edx
0x080480a5 in ?? ()
=> 0x80480a5: add edx,0xb
0x080480ab in ?? ()
=> 0x80480ab: push edx
0x080480ac in ?? ()
=> 0x80480ac: ret
0x080480ae in ?? ()

The first thing to do was to filter out the log, i.e. remove all lines containing obfuscation instructions, as well as instructions that don't contain any instructions at all (e.g. "0x080480ac in ?? ()"). This can be done with a simple set of regex in your favorite text editor. In my case (gvim) it boiled down to:


Please note that the above approach is pretty aggressive in the sense that it might remove actual meaningful instructions; but in the end that didn't matter in this case (in other cases it might though, so keep this in mind).

This resulted in a rather short assembly snippet (I've added a few comments):

=> 0x804869c: xor eax,eax
=> 0x80486c9: xor ebx,ebx ; EBX=0 (PTRACE_TRACEME)
=> 0x80486f6: mov ecx,ebx
=> 0x80486f8: inc ecx
=> 0x80486f9: mov edx,ebx
=> 0x8048726: mov esi,eax
=> 0x8048753: add eax,0xd
=> 0x8048783: shl eax,1 ; EAX=26 (ptrace)
=> 0x80488b2: mov edx,0x8048a4e ; Address of int →0← argument.
=> 0x80488e2: mov ch,BYTE PTR [edx] ; Grabbing the argument byte.
=> 0x804890f: mov cl,0x10
=> 0x804893c: shl cl,0x3 ; CL = 0x80
=> 0x804896a: mov BYTE PTR [edx],cl ; It's now int →0x80←.
=> 0x8048997: xor ch,cl ; Checking if the old argument
; was not 0x80.
=> 0x80489c4: je 0x8048653 ; If it was, go crash.
=> 0x8048a4d: int 0x80 ; Call to ptrace()
=> 0x8048aa7: mov edx,0x8048a4e
=> 0x8048ad7: mov BYTE PTR [edx],cl
=> 0x8048826: or al,al ; Check if ptrace() failed.
=> 0x8048853: jne 0x8048653 ; It did, go crash.
=> 0xf001: go.gdb:10: Error in sourced command file:
Cannot access memory at address 0xf001
Detaching from program: , process 30843

Analyzing the above code shows that the int 0 instruction discovered earlier has its argument replaced by 0x80 (Linux system call interface), but also there is a check that checks if the argument wasn't already 0x80 (it's probably this that I mistakenly assumed would be a checksum mechanism; keyword: probably, as I'm having some doubts whether I actually triggered this). After that, at 0x8048a4d ptrace(PTRACE_TRACEME) is called and the return value is checked; if it's non-zero, the 0x8048653 branch is taken and the crackme crashes (note that since this is the filtered trace, we don't see the actual instructions that cause the crash; but it's basically push 0xf001 + ret).

To mitigate this protection is enough to NOP out (i.e. replace with byte 0x90 - the no-operation instruction) the jump at 0x8048853. To do this I've added the following line to my Python patching script:

# nop-out ptrace check
d[0x853:0x859] = [0x90] * 6

Once this was done and the executable was regenerated, I've run the tracing again and filtered it with the aforementioned regex. This resulted in a not much larger listing and another crash, but the content itself was really interesting and strongly hinted at the solution:

=> 0x804832d: mov edx,0xb
=> 0x804835d: int 0x80
=> 0x80483e2: mov esi,0x8048223 ; Input address.
=> 0x8048412: mov edi,0x8048223
=> 0x8048442: mov ecx,0xb
=> 0x8048472: lods al,BYTE PTR ds:[esi]
=> 0x804849e: xor al,dl
=> 0x80484cb: inc dl
=> 0x8048524: neg ecx
=> 0x8048551: add ecx,0x8048239 ; String "myne{xtvfw~".
=> 0x8048582: cmp al,BYTE PTR [ecx]
=> 0x8048584: je 0x80485a4

The code above basically fetches one byte of input, XORs it with 0xB (which is then increased to 0xC, 0xD, and so on), and compares with one byte of the weird "myne{xtvfw~" string. This means that to get the password one needs to XOR the weird string with 0xB+index:

>>>''.join([chr(ord(x) ^ (i+0xb)) for i, x in enumerate(m)])

This resulted in another weird string, which, guess what, this indeed was the solution :)

To sum up, what I really liked about this crackme was that it used multiple protections, but all of them were pretty simple and didn't overdo it - this makes it a great challenge for early intermediate reverse-engineers to practice their skills.

On why my tbreak tracing trick did not work

By Gynvael Coldwind | Thu, 02 Feb 2017 00:10:38 +0100 | @domain:
On yesterday's livestream (you can find the video here - Gynvael's Hacking Livestream #8) one viewer asked a really good question - while analyzing a rather large application, how to find the functions responsible for a certain functionality we are interested in? For an answer I've chosen to demonstrate the simple trace comparison trick I've seen years ago in paimei (a deprecated reverse-engineering framework), however my execution was done in GDB (though any other tracing engine might have been used; also, if you understand some Polish, I've pointed at this presentation by Robert Swięcki on last year's Security PWNing Conference in Warsaw). As expected the trick didn't yield the correct result when compared with another method I've shown (backtracking from a related string) and I kept wondering why.

The general idea behind the trick goes like this: you set up tracing (and recording), start the application, and then do everything except the thing you are actually interested in. Then, you run the application again (with tracing and recording), but now you try to do only the thing you are after, touching other functions of the application as little as possible. And in the end you compare the traces - whatever is on the second list, but is not on the first one, is probably what we've been looking for.

In case of GDB and temporary breakpoints (which are one-shot, i.e. they become disabled after the first hit) it's even easier, as you can do this in a single run, first exploring all/some/most of the non-interesting functions, and then hitting the exact function you need, which in turn will display temporary breakpoint hits for whatever remaining breakpoints were still set.

So here's what I did (with pictures!):

1. I've generated an LST file of the target application (well, the target was actually GDB as well) from IDA (which is basically a direct dump of what you see in the main screen of IDA).

2. I've grepped for all the functions IDA found.

3. And converted the addresses to a GDB script that set temporary breakpoints (tbreak).

4. Then I run GDB in GDB, executed some commands and waited for all possible temporary breakpoints to hit (and become disabled). After that, I've executed the command which's implementation I was looking for (info w32 selector) and only one tbreak executed - 0x005978bc.

5. Using the known related string backtracking method I found the implementation at a totally different address - 0x0041751c. Oops.

So what did I do wrong? The answer is actually on the last screenshot. Let's zoom it in:

As one can observe, the addresses are dark red (or is it brown?), which means that IDA didn't recognize this part of code as a function. That in turns means that a breakpoint for this function is actually not on the list of temporary breakpoints, so there was never any chance for it to show up.

How to correct the problem? There are two ways:
1. If using this exact method, make sure that all the functions are actually marked as such in IDA before generating the tbreak list.
2. Or just use a different tracing method - e.g. branch tracing or another method offered by the CPU.

I've re-tested this after the stream adding the missing function breakpoint, and guess what, it worked:

(gdb) info w32 selector

Temporary breakpoint 1, 0x00417414 in ?? ()

Temporary breakpoint 4544, 0x005978bc in ?? ()
Impossible to display selectors now.

And that solves yesterday's the mystery :)

Fixing Atari 800XL - part 1

By Gynvael Coldwind | Thu, 19 Jan 2017 00:10:37 +0100 | @domain:
So my old Atari 800XL broke and I decided to fix it. Well, that's not the whole story though, so I'll start from the beginning: a long time ago I was a proud owner of an Atari 800XL. Time passed and I eventually moved to the PC, and the Atari was lent to distant relatives and stayed with them for several years. About 15 years later I got to play with my wife's old CPC 464 (see these posts: 1 2 3 4 - the second one is probably the most crude way of dumping ROM/RAM you've ever seen) and thought that it would be pretty cool to check out the old 800XL as well. My parents picked it up from the relatives (I currently live in another country) and soon it reunited with me once again! Unfortunately (or actually, fortunately) technological development moved quite rapidly through the last 20 years so I found myself not having a TV I could connect the Atari too. And so I ordered an Atari Monitor → Composite Video cable somewhere and hid the Atari at the bottom of the wardrobe to only get back to it last week.

After connecting 800XL via Composite Video to my TV tuner card (WinFast PxTV1200 (XC2028)) it turned out that the Atari was alive (I actually thought it won't start at all due to old age), it boots correctly, but the video is "flickery":

So I decided to fix it. Now, the problem is I have absolutely no idea about electronic circuitry - my biggest achievement ever in this field was creating a joystick splitter for CPC 464 (though I am actually proud of myself to have predicted the ghosting problem and fixing it before soldering anything). Which means that this whole "I will fix the Atari" statement actually means "I will learn something about electronic circuits and probably break the Atari and it will never ever work again and I will cry" (though I hope to avoid the latter).

This blog post is the first of an unknown number of posts containing my notes of the process of attempting to fix my old computer. To be more precise, in this post I'm actually describing some things I've already did to try to pinpoint the problem (this includes dumping frames directly from GTIA - this was actually fun to do). Spoiler: I still have no idea what's wrong, but at least I know what actually works correctly.

Part 1 - Debugging, some soldering, more debugging

My guess is that fixing electronic equipment is like fixing a nasty bug in unknown code. You need to familiarize yourself with the code in general and then start the process of elimination to finally pinpoint the cause of the problem. After you have done that, you need to understand in depth the problem and the cause. And then you fix it.

So from a high level perspective I expect the problem to be in one of four places:

1. The Atari motherboard itself.
2. The Atari Monitor socket → Composite video cable I'm using.
3. My TV tuner card.
4. The application I'm using for the TV tuner.

Starting from the bottom I first tried VLC to get the image from the tuner - this worked but I still got the same artifacts and I wasn't able to convince VLC to actually use the parameters I'm passing to it (I suspected the tuner is selecting the wrong PAL/NTSC mode). So I switched to CodeTV which actually did allow me to set various PAL and NTSC modes, but it turned out to not to fix my issue - actually after seeing the effect of decoding the signal as e.g. NTSC I decided this has absolutely nothing to do with my problem. So that's one point off the list.

4. The application I'm using for the TV tuner.

Next was the TV tuner card. My card - WinFast PxTV1200 (XC2028) - is a few years old and I never had much luck with it, so I ordered a new one - WinTV-HVR-1975. It should arrive next week, so whether it fixes anything will be revealed in the next part. I'll also play with some other options once I'm add it.

The second point on the list is the cable, and I'm yet to order a replacement, so I'll update this debugging branch in the next part as well.

Which brings us to the fun part - the motherboard itself.

I started by pestering my Dragon Sector teammate and our top electronics specialist - q3k - about what oscilloscope should I buy (I have some limited oscilloscope training and I figured it will be useful) and he recommended RIGOL DS1054 + a saleae logic analyzer (I got the Logic Pro 16 one) + some other useful pieces of equipment for later.

The second thing I did was to desolder all of the old electrolytic capacitors (making quite details notes about where they were and what types they were) as it seems to be a pretty standard thing to do when it comes to old computers. Once that was done (and I was able to read all of the labels, as some were initially hidden, especially in case of the axial ones) I ordered new ones and patiently waited a few days for them to arrive (along with some IC sockets which I plan to use later).

Once they did, I cleaned the motherboard around where they were at and soldered them in (I'm obviously not going to show you a hi-res photo of this as my soldering skills are that of a newbie) and connected the Atari to test it. It did boot, but the video problem was still there.

Chatting on IRC a colleague (hi skrzyp!) mentioned that it might be a GTIA problem (Graphic Television Interface Adaptor - the last IC in the line that runs Atari's video) so I decided to dump the video frame right at the moment it leaves GTIA. Researching the topic on the internet I learnt that Luminance Output 3 (i.e. pin 24) actually should be the Composite Video output (or what later becomes Composite Video); I also found out that pin 25 is Composite Sync Output, which meant I had all I needed to start.

Given that I didn't know much about Composite output I started by connecting my oscilloscope to the Composite Sync Output pin in hopes it will tell me something.

And indeed it did. It seemed that the sync signal is high for about 60μs and then low for about 5μs - this gives a cycle of 65μs, and 1/65μs seems to be about 15625 Hz. Now, this is meaningful since my Atari was PAL, and PAL is actually 25 FPS of 625 scanlines. Surprisingly 25 * 625 is also 15625, so there seems to be a correlation here - i.e. I figured that the signal goes low at the end of the scanline and goes back up at the beginning of a new one. Furthermore, after inspecting some "random flickering" on my oscilloscope I found that there is a series of 3 much shorter high states from time to time - I assumed this was the end-of-frame marker (or start-of-frame, didn't really matter to me).

After this I connected the second channel's probe to the Luminance Output 3, set the trigger to channel 2 and got this:

It took me a few moments to figure out that what I'm actually seeing is the top pixel row of the classic "READY" prompt, but yup, that was it. So it seemed that getting a frame dump shouldn't be a problem.

The next step was to connect the logic analyzer and acquire some samples. The one I have is actually 6.25MHz in case of analog data - this sadly meant that my horizontal resolution won't be good at all, as it leaves below 400 samples per scanline (6.25M / 25 FPS / 625 lines == ~400). Still, I figured it will be enough to see if there are any artifacts in the frame at this point.

A single frame in Saleae Logic looked like this (this was taken when Atari's Self Test was on):

I've exported the data to a nice 400MB CSV file and written the following Python script to convert it into a series of grayscale raw bitmaps (each CSV file contained about 50 frames):

import sys

def resize(arr, sz):
fsz = float(sz)
arr_sz = len(arr)
arr_fsz = float(arr_sz)
k = arr_fsz / fsz
out_arr = []
for i in range(sz):
idx = int(i * k)
if idx >= arr_sz:
idx = arr_sz - 1
return out_arr

def scanline_to_grayscale(scanline):
MAX_V = 7.0
scanline = resize(scanline, 400)
scanline = map(lambda x: chr(int((x / MAX_V) * 255)), scanline)
return ''.join(scanline)

if len(sys.argv) != 2:
print "usage: <fname.csv>"

f = open(sys.argv[1], "r")

# Ignore headers

FRAME_BOUNDARY_COUNT = 100 # Actually it's ~21 vs ~365. 100 is an OK value.

scanline = None
clock_high = None

ST_LOW = 1

state = ST_INIT
frame_sync = 0
frame = -1

fout = None

for ln in f:
ln = ln.split(', ')
t, clock, composite = map(lambda x: float(x), ln[:3])

clock_high = (clock > CLOCK_BOUNDARY_V)

if state == ST_INIT:
if clock_high:
state = ST_LOW

if state == ST_LOW and clock_high:
state = ST_HIGH
scanline = []

if state == ST_HIGH and clock_high:

if state == ST_HIGH and not clock_high:
state = ST_LOW

if len(scanline) < FRAME_BOUNDARY_COUNT:
frame_sync += 1

if frame_sync == 3:
frame_sync = 0
frame += 1
print "Dumping frame %i..." % frame
fout = open("frame_%.i.raw" % frame, "wb")
if fout is not None:
fout = None
elif fout is not None:

The script generated about 50 frames for a dump of "READY" screen and a similar amount for "Self Test" one. All the frames looked more or less like this:

So, apart from the low resolution and lack of color (both expected) they actually did look correct.

Which means that the GTIA I have is probably OK. One thing to be noted is that my script actually had access to the composite sync clock and the TV tuner doesn't - so if the timing slightly off my script would not catch it.

Nevertheless it was a pretty fun exercise. The next thing on my list is to read more about Composite Video and verify that all the timings at GTIA output, and later on the Composite Video cable, are actually OK. Once I do that, I have to learn analyze the circuitry that lies between the GTIA and the monitor output socket, check if all the paths look good, etc. Should be fun :)

And that's all in today's noob's notes on ECs and old computers. Stay tuned!

P.S. Please note that there are probably several errors in this post - as mentioned, I don't know too much about this field, so errors/misconceptions/misunderstandings are bound to occur. There might be some updates to this post later on.

Hacking Livestream #5 - solving picoCTF 2013 (part 1)

By Gynvael Coldwind | Wed, 23 Nov 2016 00:10:32 +0100 | @domain:
Tomorrow (sorry for the late notice) at 7pm CET (GMT+1) I'll do another livestream on CTFs - this time I'll try to show how to solve several picoCTF 2013 challenges in the time frame of the stream (2 hours). PicoCTF 2013 was an entry-level CTF created by the well known team Plaid Parliament of Pwning - so expect the challenges to range from 10 points (or 30 seconds) to 100 points (several minutes). The first stream will actually be a really good opportunity for folks wondering what are CTFs about and how to start with them to have some of their questions answered (at least I think so). Anyway, the details:As always, the stream will be recorded and will be available immediately after the stream on my channel.

See you tomorrow!

Slides about my Windows Metafile research (Ruxcon, PacSec) and fuzzing (Black Hat EU) now public

By j00ru | Tue, 15 Nov 2016 14:12:22 +0000 | @domain:
During the past few weeks, I travelled around the world to give talks at several great security conferences, such as Ruxcon (Melbourne, Australia), PacSec (Tokyo, Japan), Black Hat Europe (London, UK) and finally Security PWNing Conference (Warsaw, Poland). At a majority of the events, I presented the results of my Windows Metafile security research, which […]

Django. Restricting user login

By sil2100 | Wed, 05 Oct 2016 19:52:00 GMT | @domain:
For a Django-based sub-project I'm working on, I had the need to restrict user login to only one session active and logged-in at once. As currently I am almost a complete newbie in this framework, I tried finding a ready solution around the web and failed, as nothing really fit my needs. After gathering some bits and pieces of information from around the internet I wrote a quick and very simple piece of auth code to do the login restriction I wanted.

Windows system call tables updated, refreshed and reworked

By j00ru | Mon, 15 Aug 2016 13:07:11 +0000 | @domain:
Those of you interested in the Windows kernel-mode internals are probably familiar with the syscall tables I maintain on my blog: the 32-bit and 64-bit listings of Windows system calls with their respective IDs in all major versions of the OS, available here (and are also linked to in the left menu): Windows Core (NT) […]

Launchpad API. Confusing binary builds

By sil2100 | Thu, 11 Aug 2016 20:29:00 GMT | @domain:
Another post on Launchpad API - will try to make it my last one, no worries. It's just that recently I've been dealing with it so much that I feel like sharing some of its caveats and my experiences with it. Today's post will be a short story about a certain edge-case one would need to watch out, titled: "accessing source packages through binary builds can be confusing".

Hacking Livestream #4 - DEF CON CTF (Friday)

By Gynvael Coldwind | Tue, 09 Aug 2016 00:10:22 +0200 | @domain:
I'm back from Black Hat / DEF CON, so it's time to do another live hacking session! The next one will be Friday, 12th of August, same time as usual (7pm UTC+2) at (aka YouTube). I'll talk about this year's DEF CON CTF (while it's still fresh in my memory), i.e. the formula, the tasks, the metagame, etc. I'll also show a couple of bugs and exploit one or two of them (i.e. whatever I can fit into 2h of the stream).

When: August 12, 19:00 UTC+2
What: DEF CON CTF 2016

See you there!

P.S. Feel free to let whoever might be interested know, the more the merrier :)

Live sec/hack session #3 - Thursday

By Gynvael Coldwind | Tue, 26 Jul 2016 00:10:19 +0200 | @domain:
Just a short note - I'll attempt to finish solving bart's CrackMeZ3S reverse-engineering challenge on Thursday, 28th of June, 19:00 UTC+2 (i.e. usual time).

Where: (YouTube - yes, I decided to move there)
When: June 28, 19:00 UTC+2
What: CrackMeZ3S part 2 (see part 1 announcement post / video)

See you then :)

P.S. The linear disassembly view I've mentioned I'm lacking in Binary Ninja? Seems it's already there in the new version - sweet!

Disclosing stack data (stack frames, GS cookies etc.) from the default heap on Windows

By j00ru | Mon, 25 Jul 2016 09:29:02 +0000 | @domain:
In the previous blog post, I discussed a modest technique to “fix” the default process heap in order to prevent various Windows API functions from crashing, by replacing the corresponding field in PEB (Process Environment Block) with a freshly created heap. This of course assumes that the attacker has already achieved arbitrary code execution, or is […]

LiveCoding or YouTube? Input needed

By Gynvael Coldwind | Sun, 24 Jul 2016 00:10:18 +0200 | @domain:
As I mentioned on Friday's livestream, I'm considering moving my streams to YouTube due to several factors (quality, less technical issues, etc). Keyword here is "considering", however I would like to make a decision before the next stream - thus this post and my request for your feedback.

The table below presents things I will take into account when deciding:

  • Resolution: 1080p
  • Bitrate: 3000kbps
  • Unlogged users don't see chat.
  • Rewind-during-live feature.
  • Adjustable quality during live for the viewer.
  • High delay between recording/streaming (20-50 sec).
  • HTML5 player by default.
  • Just works, or at least I did not receive a significant amount of negative feedback about crashes/lags/etc.
  • Has an API, chat is custom, but has REST API.
  • Around 2/3 of my viewers would prefer to move to YT***.
  • Resolution: 720p
  • Bitrate: 1500kbps (2500kbps said to be rolled out in 2-3 weeks)
  • Sound quality is lacking; might improve after 2500kbps rollout.
  • Unlogged users don't see chat.
  • Fast support, direct contact with project owner*
  • Very low delay between recording/streaming (~2-5 sec).
  • Flash player by default. Can't enable HTML5 player without disabling Flash in the browser.
  • Several reports of crashes or player not working (on last stream the player/stream crashed, even for me).
  • Strictly a coding service**. I.e. easier for people to randomly discover my stream just by going to when I'm streaming.
  • Has an API, chat is XMPP.
* One might point out that given that I actually work at Google I would have direct contact with YT's engineers as well - while that is true, I prefer not to bother them with personal projects, unless it's a valid bug report or valid (in my head) feedback for a given feature of course.
** One might argue that my streams are not strictly speaking coding (well, it's security/hacking/reverse-engineering with a large dose of coding) but I would say it still fits and I did not hear otherwise.
*** Based on a Twitter poll as well as chat responses during the stream.

Another thing one might point out is that LiveCoding has a neat dark layout, which is better at nighttime. It turns out, that youtube has a similar one - just change the "www" to "gaming" in the address when you watch the stream (e.g. vs

There are probably more tiny details here and there, but they are either not as obvious as the things listed above, or not really important in my case.

At this point I'm leaning towards moving to YouTube, as I did with my other live streams. Is there anything I missed and should take into account? Please let me know in the comments down below - thanks!

Live security/hacking/coding session #2

By Gynvael Coldwind | Wed, 20 Jul 2016 00:10:16 +0200 | @domain:
I'll be doing more livestreaming this Friday, same time (19:00 UTC+2) on (which at this moment points to, but looking at this poll I'll probably move to YouTube with my next streams). There are two items on my list for Friday's:
  • Either "Zippy" (WEB 300 from CONFidence CTF 2016, by mlen) or "Revenge of Bits" (STEGANO 200 from the same on, by me).
  • And CrackMeZ3S by bart after that. Please note that I might be struggling a lot with this one, as I did not solve/see it before, and I plan to keep it this way (a couple of viewers requested that I show my approach to unknown targets - well, that's the plan for this stream).
Apart from that one more thing: we actually have an IRC channel for my streams (well, Polish-language streams so far), but there is no reason for English speakers not to join; it's #gynvaelstream @ freenode. Or perhaps I should make a separate channel for the English streams? Let me know what you think in the comments.

In any case, see you Friday!

After live session #1 - how did you like it?

By Gynvael Coldwind | Sat, 16 Jul 2016 00:10:15 +0200 | @domain:
So my first livestream in English took place yesterday evening (i.e. evening in my timezone) and it went rather smoothly - nothing crashed, broadcasting was not interrupted at any time and I even was able to go through both ReRe (Python RE 500) and EPZP (x86-64 Linux RE 50) challenges. The archived video is already up on YouTube (see also below) and what's left to do is ask about about your opinion: what do you think? Or, to be more precise, what do you think about stream quality, the content, the way I was presenting things (i.e. talking about what is happening, but sacrificing speed due to that), the chat, and so on? What topics would you like to hear about next (another CTF challenge or maybe something else)? Please use the comment section below - your opinion is welcomed!

EDIT: see also this twitter poll: LiveCoding or YouTube?

Livestream starts at 15:20

See you next week!