Live security/hacking/coding session #2

By Gynvael Coldwind | Wed, 20 Jul 2016 00:10:16 +0200 | @domain: favicongynvael.coldwind.pl
I'll be doing more livestreaming this Friday, same time (19:00 UTC+2) on gynvael.coldwind.pl/live-en (which at this moment points to livecoding.tv/gynvael, but looking at this poll I'll probably move to YouTube with my next streams). There are two items on my list for Friday's:
  • Either "Zippy" (WEB 300 from CONFidence CTF 2016, by mlen) or "Revenge of Bits" (STEGANO 200 from the same on, by me).
  • And CrackMeZ3S by bart after that. Please note that I might be struggling a lot with this one, as I did not solve/see it before, and I plan to keep it this way (a couple of viewers requested that I show my approach to unknown targets - well, that's the plan for this stream).
Apart from that one more thing: we actually have an IRC channel for my streams (well, Polish-language streams so far), but there is no reason for English speakers not to join; it's #gynvaelstream @ freenode. Or perhaps I should make a separate channel for the English streams? Let me know what you think in the comments.

In any case, see you Friday!

After live session #1 - how did you like it?

By Gynvael Coldwind | Sat, 16 Jul 2016 00:10:15 +0200 | @domain: favicongynvael.coldwind.pl
So my first livestream in English took place yesterday evening (i.e. evening in my timezone) and it went rather smoothly - nothing crashed, broadcasting was not interrupted at any time and I even was able to go through both ReRe (Python RE 500) and EPZP (x86-64 Linux RE 50) challenges. The archived video is already up on YouTube (see also below) and what's left to do is ask about about your opinion: what do you think? Or, to be more precise, what do you think about stream quality, the content, the way I was presenting things (i.e. talking about what is happening, but sacrificing speed due to that), the chat, and so on? What topics would you like to hear about next (another CTF challenge or maybe something else)? Please use the comment section below - your opinion is welcomed!

EDIT: see also this twitter poll: LiveCoding or YouTube?

Livestream starts at 15:20


See you next week!

Windows user-mode exploitation trick – refreshing the main process heap

By j00ru | Tue, 12 Jul 2016 09:53:51 +0000 | @domain: faviconj00ru.vexillium.org
During the weekend of May 21-23 (directly after the CONFidence CTF that we organized with Dragon Sector), qualifications to the famous DEF CON CTF 2016 took place. We obviously participated in what is probably the most binary heavy, challenging and competitive CTF of the year, eventually ending up 9th on the final scoreboard, which was sufficient […]

Live security/hacking/coding session #1

By Gynvael Coldwind | Mon, 11 Jul 2016 00:10:14 +0200 | @domain: favicongynvael.coldwind.pl
A few days ago I've posted a short note on Twitter asking if anyone would be interested in a livestream about hacking/security/coding in English - I figured that since I'm already doing them in Polish, I might as well try to do one in English. The response was overwhelmingly positive (thanks!), so it seems time has come to set a date:
  • When: 15 July 2016, 19:00 UTC+2
  • Where: gynvael.coldwind.pl/live-en - the link is not yet working, but it will lead to https://livecoding.tv/gynvael (the platform might change for the next episode, depending on whether everything runs smoothly and the quality if good).
  • Topic: a CTF challenge or two, probably exploitation or reverse engineering; since this is the first episode I'll take it easy and go with something I'm already familiar with - either a challenge created by me or one I've already solved in the past.
  • What to expect: Broken Slavic-sounding English (I'm not a native, my accent is far from perfect and my vocabulary is scarce - you have been warned). Since I'm used to tutoring, I'll try to explain everything that I'm doing in a clear way.
Feel free to pass the link to this post to others whom might also be interested - the more, the merrier (plus, we get to do a stress test of the streaming infrastructure, which is a good thing for future episodes).

See you Friday!

Details on a (not so recent now) stack-based buffer overflow in the Adobe CFF rasterizer in FreeType2 (CVE-2014-2240, CVE-2014-9659)

By j00ru | Tue, 07 Jun 2016 13:53:14 +0000 | @domain: faviconj00ru.vexillium.org
This blog has experienced a long time of inactivity, as I’ve recently used it only to publish slides from conferences I presented at, with many months-long breaks in between. I am planning to change things up and start posting again in the upcoming weeks, starting with this blog post, which I originally wrote in early 2014. I haven’t […]

Sources of ReRe, a Python RE500 challenge

By Gynvael Coldwind | Fri, 15 Apr 2016 00:10:02 +0200 | @domain: favicongynvael.coldwind.pl
ReRe
The CONFidence Teaser CTF 2016 by Dragon Sector is now over and the results are in (congratz 9447!). Therefore I decided to share the sources of my task called ReRe, which was a Python rainbow-heavy obfuscation-heavy bytecode-all-around challenge. I won't spoil too much in case you would like to try to solve it (crackme/rere.py in the archive), but if you would like to read more on it, just see the SOLUTION.md file in the zip file. I'll add, that the obfuscation used self-modifying bytecode, some bytecode-level obfuscation and minor string obfuscation as well, so if you would like to learn more about Python 2.7 internal code representation, try your luck with ReRe :) It was solved 5 times btw.

Download: confidence-teaser-2016-ds-gynvael-rere.zip
"Video": rere_anim.gif (a 3 MB gif, you have been warned)
Have fun, good luck!

System Image Server Explained

By sil2100 | Wed, 23 Dec 2015 19:34:00 GMT | @domain: faviconsil2100.vexillium.org
Ubuntu System Image is the name of the client/server infrastructure used for Ubuntu Touch and (currently also) Ubuntu Core for image-based upgrades. In certain scenarios the standard apt/dpkg package-based upgrade mechanism is simply not good enough, so the whole Ubuntu System Image initiative was started a few years back. The purpose of an s-i server is to generate and export new images for selected use-cases whenever needed, allowing the client to notice and do the upgrade. Anyone can setup their own system-image server, in which case it's good to know what is what and how to prepare your configuration - along with some useful tricks.

Video: Python in a hacker's toolbox (PyConPl'15)

By Gynvael Coldwind | Fri, 23 Oct 2015 00:09:32 +0200 | @domain: favicongynvael.coldwind.pl
PyConPl'15 logoJust a short note that the video from my talk "Python in a hacker's toolbox" (PyConPl'15) is already available on youtube. The slides can be found here.

Abstract:
A classical language set used by a security specialist included assembly and C, sometimes joined by C++ and usually quite a lot of Bash as well. A few years ago it seemed that Perl, and later Ruby, will become the scripting language of choice in the security field, however another contender - Python - was gaining user base too. Today it's rather obvious that Python won its place in the hacker's toolbox, especially given that a great deal of important tools of trade allow to be instrumented/scripted using it - examples include even the most basic utensils - IDA, GDB and Burp. Furthermore, Python with its set of standard libraries makes it extremely easy to create ad-hoc tools whenever they're needed. At the same time, due to rich introspection mechanisms, the language itself is an object of fascination from the security scene. The talk will focus on a few selected cases of Python intertwining with the security world.

The talk is basically a mix of Python related topics I've touched during other talks I gave (commonly with j00ru) - this includes:
■ "Data, data, data..." (English, blog post + video)
■ "On the battlefield with the dragons" (English, blog post + video)
■ "Ataki na systemy i sieci komputerowe" (Polish, slides)
■ "Pwning (sometimes) with style - Dragons' notes on CTFs" (English, slides)

Video:


Cheers,

44CON slides and details about further Windows kernel font vulnerabilities are out

By j00ru | Thu, 17 Sep 2015 10:17:25 +0000 | @domain: faviconj00ru.vexillium.org
Since my last blog post and the REcon conference in June, I have continued working on font security, especially in the area of Windows kernel and font engines derived from the Adobe Type Manager Font Driver. More specifically, I moved from manually auditing PostScript Charstring implementations to running automated fuzz-testing of the overall font-handling code; after […]

Status update. LP API, release process and Qt5 QPA shortcuts

By sil2100 | Mon, 31 Aug 2015 13:01:00 GMT | @domain: faviconsil2100.vexillium.org
Things are very busy as always - not having enough time to write a full-content article I decided to at least post a quick update on what I'm working on currently. Most of it is of course Ubuntu related: besides preparations for the next Ubuntu Touch update (OTA-7) and dealing with the finalization of the previous one (OTA-6), I'm also working on two Ubuntu User articles and, additionally, working my way to becoming an Ubuntu Core Developer. I'm also investigating a bug related to Qt5 QPlatformTheme keyboard shortcut handling.

Results of my recent PostScript Charstring security research unveiled

By j00ru | Tue, 23 Jun 2015 18:38:51 +0000 | @domain: faviconj00ru.vexillium.org
Some months ago, I started reverse engineering and investigating the security posture of the Adobe Type Manager Font Driver (ATMFD.DLL) module, which provides support for Type 1 and OpenType fonts in the Windows kernel since Windows NT 4.0, and remains there up to this day in Windows 8.1. Specifically, I focused on the handling of […]

Binary to source name. The Launchpad API

By sil2100 | Sat, 06 Jun 2015 15:22:00 GMT | @domain: faviconsil2100.vexillium.org
It's been a while since I wrote a programming-related post. Today I'd like to share with you a very simple, but useful, thing in the 'devel' version of the Launchpad API. When using LP or writing Python tools that need to deal with Ubuntu repositories, packages and their versions, frequently the need appears to get the source package name from its resulting binary package name as published in the selected archive (usually the main archive). It's a rather new addition, but really really useful.

Open Source Days. DWO 2015

By sil2100 | Mon, 27 Apr 2015 22:19:00 GMT | @domain: faviconsil2100.vexillium.org
A quick post this time. A week ago I have briefly attended an open-source conference in Bielsko-Biala, Poland. Due to a Canonical sprint overlapping, I was only able to arrive for the last day - Sunday, so I missed out on many interesting presentations. But at least I was able to meet some very interesting people and do a short talk about the Ubuntu Touch release process, quickly overviewing what tools we use and what processes we follow.

When in Wroclaw - Piwnica Quest

By Gynvael Coldwind | Fri, 27 Mar 2015 00:09:22 +0100 | @domain: favicongynvael.coldwind.pl
A couple of hours ago I found myself, together with a couple of friends, locked in a small vault in a basement of an old tenement house in Wrocław/Poland. Objective: escape the room in 60 minutes (+ complete a side quest). To do this we had to look for clues, solve riddles, break codes (not unlike some crypto challenges I've seen on CTFs, though much simpler) and do quite a lot of creative thinking. In the end we failed (we were so close it's painful!). But we had A LOT of fun on the way anyway :). This kind of game is called "Live Escape Room" and the one we went to, which I strongly recommend, was the room "Vault" by Piwnica Quest.

While I shouldn't write anything about the room (it would just spoil the fun for others and that's definitely an anti-objective of this post), I'll mention that our group was 5 people (which is the max. for Piwnica Quest as far as I know) and that I was really amazed by some of the riddles they created there.

And yes, the riddles are in English as well, so you don't have to know encryptedPolish.

So again, a link to their site:
http://www.piwnica-quest.com/

And I wish you the usual HF GL!

P.S. Full-disclosure: No, this is NOT a sponsored post - there are no sponsored posts on this blog. I really had fun and that's why I'm recommending it :)
P.S.2. I've been told there are more Live Escape Rooms in Wrocław as well - seems to be a good city for fans of this kind of activity.

Insomni’hack 2015, presentation slide deck and CTF results

By j00ru | Tue, 24 Mar 2015 18:48:28 +0000 | @domain: faviconj00ru.vexillium.org
(Collaborative post by Gynvael Coldwind and Mateusz “j00ru” Jurczyk) Just three days ago another edition of the great Insomni’hack conference held in Geneva came to an end. While the event was quite short, lasting for just one day, it featured three tracks of security talks, including some very interesting ones such as Automotive security by […]

Insomni'hack 2015, presentation slide deck and CTF results

By Gynvael Coldwind | Tue, 24 Mar 2015 00:09:21 +0100 | @domain: favicongynvael.coldwind.pl
(Collaborative post by Gynvael Coldwind and Mateusz “j00ru” Jurczyk)
Just three days ago another edition of the great Insomni'hack conference held in Geneva came to an end. While the event was quite short, lasting for just one day, it featured three tracks of security talks, including some very interesting ones such as Automotive security by Chris Valasek, or Copy & Pest – A case-study on the clipboard, blind trust and invisible cross-application XSS by Mario Heiderich. This year we were also invited to the conference to talk about CTF techniques, experiences and entertaining tasks encountered by the Dragon Sector team we lead and actively play in. We thus gave a presentation called Pwning (sometimes) with style – Dragons’ notes on CTFs, and are now making the slide deck publicly available for your enjoyment:

Pwning (sometimes) with style – Dragons’ notes on CTFs (3.86MB, PDF)

While the conference was very well organized and had many interesting talks, the main event of the evening was only about to start at 18:00: the CTF competition organized by the Insomni'hack crew, which attracted hundreds of players from all around the world, including many top teams from the CTF scene (e.g. StratumAuhuur, int3pids, dcua, penthackon, 0x8F). Since we really liked the finals from last year, Dragon Sector also came back in a large squad of 9 players; one of whom played in a different team due to a strict 8-person limit. We did our best to defend last year's title (top 1) and eventually succeeded, but it was not an easy task for sure. The most intense moment was when the StratumAuhuur team submitted a flag 4 minutes before the end of the CTF (at 3:56:23 AM), closing our point advantage to only ~20 points, which was so close that it could have easily changed in favor of Stratum regardless of our actions (due to this year's variable nature of tasks scoring, which accounted for the total number of teams solving each challenge). Fortunately, Gynvael and I were on a verge of solving another networking task at the time and barely managed to get it a little more than a minute before the end of the competition, consequently securing a win. The situation is well illustrated in the photo of the final ranking below.

The organizers, SCRT, have also published their own summary of the CTF with a full ranking and some interesting stats: Insomni’hack finals – CTF results.


How to automatically extract all raw bitmaps from a memory dump?

By Gynvael Coldwind | Fri, 27 Feb 2015 00:09:19 +0100 | @domain: favicongynvael.coldwind.pl
That's actually a real question with no solution (though some links) posted in this blog post. And the keyword here is "automatically" ;>

Let's starts by me making sure that the problem is stated clearly: we assume, that we have a large memory blob (anything between 500 MB to 1 TB) and we want to find all raw bitmaps and their width in it. Furthermore, since this is kinda ambiguous, by "raw bitmaps" I mean neither camera RAW formats used in digital photography (NEF, ORF, CR2 and the like) nor "image files" (like PNG, BMP, JPG, GIF, TIFF and the like) - how to find most of these things is of course common knowledge that can be summarized by "find magic or pattern that's commonly at the beginning". This approach has been used by many old school ripper programs like Multi Ripper (seen around in late '90, though I remember such apps from at least a few years earlier) or other similar though older apps, as well as newer stuff like binwalk or PhotoRec. What we're looking for is just plain bitmap data (8/24/32 bpp for starters) without any magic values, headers, compression or other strange encodings.



Where would this be useful? In analyzing various memory dumps or disk dumps where you can't make any smart calls about kernel/FS/heap/app memory structures or if parts of said are missing/have been wiped (so volatility/Slueth Kit are useless).

Usually the way I did this (and still do) was to open the file in IrfanView as .raw, set width to something around 1024, height to a large value, offset to whatever part I was analyzing and then I scrolled through the huge bitmap counting on my brain to spot any patterns. I'm not going to describe the exact details of this method, since Bernardo beat me to it and I have really nothing to add (though his GIMP method seems more friendly as you have a scroll bar to set the width which looks waaay better than putting the number manually in IrfanView). The thing I found surprising about his post is that the CTF task he gives as an example - coor coor from 9447 - is the exact task I had in mind when spawning the discussion with Ange (which later moved to twitter and made Bernardo write his post). Here are three of my findings from that task:







The discussion at twitter included several interesting links/ideas:
- @doegox pointed to his tool https://doegox.github.io/ElectronicColoringBook/
- @jchillerup pointed to the cantor dust talk/tool which doesn't solve the problem, but is (i.e. looks like) probably the best non-automatic tool for this purpose; some patterns remind me of one of my previous blog posts, which spawns an idea I guess on how to find candidate bitmaps in the binary blob.
- @scanlime pointed to the autocorrelation problem, which names the problem I was thinking about and points to the solution
- @hanno pointed to JPEG compression tested on various widths/offsets, which would be another idea to find candidate bitmaps
- @sqaxomonophonen pointed to FFT and looking for spikes, which would be a way to determine the width
- @CrazyLogLad suggested something similar
- @aeliasen said this:

I'd calculate the autocorrelation of the bytes; period with strongest autocorr. should give width. http://futureboy.us/fsp/colorize.fsp?f=correlation.frink You might have to throw out small periods (like 1-3) and divide by pixel depth.
Seems I need to do some reading on autocorrelation/FFT to move this forward.

If someone would like to try his luck with any of the two problems ([1] finding bitmap candidates in a LARGE binary blob and [2] automatically determining width of the candidate), the coor coor dump is here (link shamelessly taken from Bernardo's blog):
coorcoor.tar.bz2

If you have any other ideas, comments or links, feel free to add them in the comment section.

Cheers,

Ubuntu proposed migration. update_output.txt

By sil2100 | Mon, 02 Feb 2015 11:47:00 GMT | @domain: faviconsil2100.vexillium.org
Those of you that know a thing or two about the Ubuntu archives also most probably know about the proposed pocket for every distribution series. In a quick overview, every upload made to the main archives first goes to -proposed and then migrates (in case of the development series) to the release pocket once the so called proposed migration is happy with it. Most of the time it just migrates fine on its own, but sometimes a package can fail to "move on". And this is where update_excuses.html and update_output.txt come in handy.

SECURE 2014 slide deck and Hex-Rays IDA Pro advisories published

By j00ru | Thu, 23 Oct 2014 12:32:55 +0000 | @domain: faviconj00ru.vexillium.org
Yesterday I gave a talk at a Polish security conference held in Warsaw, Poland, called “Ucieczka z Matrixa: (nie)bezpieczna analiza malware” (eng. “Escaping the Matrix: (in)secure malware analysis”). The presentation was lightly technical and concerned the different threats of using popular software to aid in interacting with and analyzing malware samples. While the talk was […]

CONFidence 2014 video from our talk on CTFs

By Gynvael Coldwind | Sat, 19 Jul 2014 00:09:01 +0200 | @domain: favicongynvael.coldwind.pl
Just a quick note: the video from j00ru's and my talk from this year's CONFidence edition is now online. As mentioned in the previous post on the topic, the talk was called "On the battlefield with the Dragons" and consisted of a selection of interesting CTF task solutions with some useful tips and trick near the end.

Links: video, slides.



Let us know what you think!

Slides from Ange's and my talk about Schizophrenic files, Area41

By Gynvael Coldwind | Tue, 03 Jun 2014 00:08:59 +0200 | @domain: favicongynvael.coldwind.pl
Yesterday I had the pleasure to co-present with Ange Albertini (@angealbertini) - if you are into binary stuff, you probably know his website - corkami, which has all sorts of cool stuff, from posters detailing binary format (e.g PE 101) to binary polyglots, etc. We talked about "schizophrenic files", i.e. various file formats which get interpreted differently depending on what program you use (e.g. a BMP image which, when viewed in one viewer, shows a cat but when using a different one shows a flying shark). Basically the story goes that we both did (separately) some more or less random digging on (or more accurately in my case: randomly stumbling on) behaviors which allow one to create a file which is open to creative interpretation by the software, or (more commonly) parser authors just decide to not follow the specs or understand them in a different way; we decided to gather all this in one place and hence the talk. We presented it at Area41 in Zurich (which btw turned out to be really well organized and awesome conference). Slides and PoCs are available below.

Slides: Schizophrenic files (Ange Albertini, Gynvael Coldwind)
PoCs: Schizophrens (PoC) ("All PoCs.zip" contains all the files from the directories)

As usual, feedback is most welcome!

Cheers,

CONFidence 2014 slides from Dragon Sector are now available

By j00ru | Thu, 29 May 2014 10:07:24 +0000 | @domain: faviconj00ru.vexillium.org
(Collaborative post by Gynvael Coldwind and Mateusz “j00ru” Jurczyk) Just yesterday another edition of the largest and most successful IT security conference held in Poland – CONFidence – ended. The Dragon Sector CTF team (which we founded and are running) actively participated in the organization of the event by hosting an onsite, individual CTF for […]

CONFidence 2014 slides from Dragon Sector are now available

By Gynvael Coldwind | Thu, 29 May 2014 00:08:57 +0200 | @domain: favicongynvael.coldwind.pl
(Collaborative post by Gynvael Coldwind and Mateusz "j00ru" Jurczyk)

Just yesterday another edition of the largest and most successful IT security conference held in Poland - CONFidence - ended. The Dragon Sector CTF team (which we founded and are running) actively participated in the organization of the event by hosting an onsite, individual CTF for the conference attendees and giving a talk about the most interesting challenges we have solved so far in our not too long CTF career.

The final standings of the CONFidence 2014 CTF can be found below. We will also publish a more detailed summary, together with some or all of the challenges, on our official Dragon Sector blog within a few days.

1. liub, 2. dcua, 3. 4c...fd sector

The slide deck from our presentation can be found below:
On the battlefield with the Dragons - the interesting and surprising CTF challenges (3.93MB, PDF)

Enjoy!

A case of a curious LibTIFF 4.0.3 + zlib 1.2.8 memory disclosure

By j00ru | Wed, 30 Apr 2014 14:23:21 +0000 | @domain: faviconj00ru.vexillium.org
As part of my daily routine, I tend to fuzz different popular open-source projects (such as FFmpeg, Libav or FreeType2) under numerous memory safety instrumentation tools developed at Google, such as AddressSanitizer, MemorySanitizer or ThreadSanitizer. Every now and then, I encounter an interesting report and spend the afternoon diving into the internals of a specific […]

The perfect int == float comparison

By Gynvael Coldwind | Sun, 27 Apr 2014 00:08:55 +0200 | @domain: favicongynvael.coldwind.pl
Just to be clear, this post is not going to be about the float vs. float comparison. Instead, it will be about trying to compare a floating point value with an integer value in an accurate, precise way. It will also be about why just doing int_value == float_value in some languages (C, C++, PHP, and some other) doesn't give you the result you would expect - a problem which I recently stumbled on when trying to fix a certain library I was using.

UPDATE: Just to make sure we see it in the same way: this post is about playing with bits and floats just for the sake of playing with bits and floats; it's not something you could or should use in anything serious though :)

UPDATE 2: There were two undefined behaviours pointed out in my code (one, two) - these are now fixed.

The problem explained

Let's start by demonstrating a the problem by running the following code that compares subsequent integers with a floating point value:

float a = 100000000.0f;
printf("...99 --> %i\n", a == 99999999);
printf("...00 --> %i\n", a == 100000000);
printf("...01 --> %i\n", a == 100000001);
printf("...02 --> %i\n", a == 100000002);
printf("...03 --> %i\n", a == 100000003);
printf("...04 --> %i\n", a == 100000004);
printf("...05 --> %i\n", a == 100000005);

The result:

...99 --> 1
...00 --> 1
...01 --> 1
...02 --> 1
...03 --> 1
...04 --> 1
...05 --> 0

Sadly this was to be expected in the floating point realm. However, while in this world both 99999999 and 100000004 might be equal to 100000000, this is sooo not true for common sense nor standard arithmetic.

Let's look at another example - an attempt to sort a collection of numbers by value in PHP:

<?php
$x = array(
20000000000000002,
20000000000000003,
20000000000000000.0,
);
sort($x);

foreach ($x as $i) {
if (is_float($i)) {
printf("%.0f\n", $i);
} else {
printf("%i\n", $i);
}
}

The "sorted" result (64-bit PHP):

> php test.php
20000000000000002
20000000000000000
20000000000000003

Side note: The code above must be executed using 64-bit PHP. The 32-bit PHP has integers limited to 32-bit, so the numbers I used in the example would exceed their limit and would get silently converted to doubles. This results in the following output:

20000000000000000
20000000000000000
20000000000000004


So, what's going on?

It all boils down to floats having to little precision for larger integers (this is a good time to look at this and this). For example, the 32-bit float has only 23 bits dedicated to the significand - this means that if an integer value that is getting converted to float needs more than 24 bits (sic!; keep in mind that in floats there is a hardcoded "1" at the top position, which is not present in the bit-level representation) to be represented, it will get truncated - i.e. the least significant bits will be treated as zeroes.

In the C-code case above the decimal value 100000001 actually requires 27 bits to be properly represented:

0b101111101011110000100000001

However, since only the leading "1" and following 23-bits will fit inside a float, the "1" at the very end gets truncated. Therefore, this number actually becomes another number:

0b101111101011110000100000000

Which in decimal is 100000000 and therefore is equal to the float constant of 100000000.0f.

Same problem exists between 64-bit integers and 64-bit doubles - the latter have only 52 bits dedicated for storing the value.

A somewhat amusing side note

Actually, it gets even better. Let's re-write the first code shown above (the C one) to use a loop:

float a = 100000000.0f;
int i;
for(i = 100000000 - 5; i <= 100000000 + 5; i++) {
printf("%11.1f == %9u --> %i\n", a, i, a == i);
}

As you can see, there are no big changes. Now let's compile it and run it:

>gcc test.c
> a
100000000.0 == 99999995 --> 0
100000000.0 == 99999996 --> 0
100000000.0 == 99999997 --> 0
100000000.0 == 99999998 --> 0
100000000.0 == 99999999 --> 0
100000000.0 == 100000000 --> 1
100000000.0 == 100000001 --> 0
100000000.0 == 100000002 --> 0
100000000.0 == 100000003 --> 0
100000000.0 == 100000004 --> 0
100000000.0 == 100000005 --> 0

The result is magically correct! How about we compile it with optimization then?

>gcc test.c -O3
> a
100000000.0 == 99999995 --> 0
100000000.0 == 99999996 --> 1
100000000.0 == 99999997 --> 1
100000000.0 == 99999998 --> 1
100000000.0 == 99999999 --> 1
100000000.0 == 100000000 --> 1
100000000.0 == 100000001 --> 1
100000000.0 == 100000002 --> 1
100000000.0 == 100000003 --> 1
100000000.0 == 100000004 --> 1
100000000.0 == 100000005 --> 0

Why is that? Well, in both cases the compiler needs to convert the integer to a float and then compare it with the second float value. This however can be done in two different ways:

Option 1: The integer is converted to a floating point value, then is stored in memory as a 32-bit float and then loaded into the FPU for the comparison OR (in case of constants) the integer constant can be converted to a 32-bit float constant at compilation time and then it will be loaded into the FPU for comparison at runtime.
Option 2: The integer is directly loaded into the FPU for comparison (using fild FPU instruction or similar).

The difference here is related to the FPU internally operating on larger floating point values with more precision (by default it's 80-bits, though you can change this) - so the 32-bit integer isn't truncated on load, as it would happen if it gets converted explicitly to a 32-bit float (which, again, has only 24-bits for the actual value).

Which option is selected depends strictly on the compiler - it's mood, version, options used at compilation, etc.

The perfect comparison

Of course, it's possible to do a perfect comparison.

The simplest and most straightforward way is to cast both the int value and the float value to a double before comparing them - double has large enough significand to store all possible 32-bit int values. And for the 64-bit integers you can use the 80-bit long double which has exactly 64 bits dedicated for storing the value (plus the ever-present "1").

But that's too easy. Let's try to do the actual comparison without converting to larger types.

This can be done in two ways: the "mathematical" way (or: value-specific way) and the encoding-specific way. Both are presented below.

UPDATE 3: Actually there seems to be another way, as pointed out in the comments below and in this reddit post. It does make sense, but I still wonder if there is any counterexample (please note that I'm not saying there is; I'm just saying it never hurts to look for one ;>).

The mathematical way

We basically do it the other way around - i.e. we try to convert the float to an integer. There are a couple of problems here which we need to deal with:

1. The float value might be bigger than INT_MAX or smaller than INT_MIN. In such case this might happen and we wouldn't be able to catch it after the conversion, so we need to deal with it sooner.

2. The float value might have a non-zero fractional part. This would get truncated when converted to an int (e.g. (int)1.1f is equal to 1) - we don't want this to happen either.

The implementation of this method (with some comments) is presented below:

bool IntFloatCompare(int i, float f) {
// Simple case.
if ((float)i != f)
return false;

// Note: The constant used here CAN be represented as a float. Normally
// you would want to use INT_MAX here instead, but that value
// *cannot* be represented as a float.
const float TooBigForInt = (float)0x80000000u;

if (f >= TooBigForInt) {
return false;
}

if (f < -TooBigForInt) {
return false;
}

float ft = truncf(f);
if (ft != f) {
// Not an integer.
return false;
}

// It should be safe to cast float to integer now.
int fi = (int)f;
return fi == i;
}

The encoding-specific way

This method relies on decoding the float value from the bit-level representation, checking if it's an integer, checking if it is in range and finally comparing the bits with the integer value. I'll just leave you with the code. If in doubt - refer to this wikipedia page.

bool IntFloatCompareBinary(int i, float f) {
uint32_t fu32;
memcpy(&fu32, &f, 4);

uint32_t sign = fu32 >> 31;
uint32_t exp = (fu32 >>23) & 0xff;
uint32_t frac = fu32 & 0x7fffff;

// NaN? Inf?
if (exp == 0xff) {
return false;
}

// Subnormal representation?
if (exp == 0) {
// Check if fraction is 0. If so, it's true if "i" is 0 as well.
// Otherwise it's false in all cases.
return (frac == 0 && i == 0);
}

int exp_decoded = (int)exp - 127;

// If exponent is negative, the number has a fraction part, which means it's not equal.
if (exp_decoded < 0) {
return false;
}

// If exponenta is above or equal to 31, int cannot represent so big numbers.
if (exp_decoded > 31) {
return false;
}

// There is one case where exp_decoded equal to 31 makes sens - when float is
// equal to INT_MIN, i.e. sign is - and fraction part is 0.
if (exp_decoded == 31 && (sign != 1 || frac != 0)) {
return false;
}

// What is left is in range of integer, but still can have a fraction part.

// Check if any fraction part will be left.
uint32_t value_frac = (frac << exp_decoded) & 0x7fffff;

if (value_frac != 0) {
return false;
}

// Check the value.
int value = (1 << 23) | frac;
int shift_diff = exp_decoded - 23;
if (shift_diff <0) {
value >>= -shift_diff;
} else {
value <<= shift_diff;
}

if (sign) {
value = -value;
}

return i == value;
}

Summary

The above functions can be used for a perfect comparison and they SeemToWork™ (at least on little endian x86). With some more work both functions could be converted to be perfect "less than" comparators which then could be used to fix the PHP sorting example.

But... seriously, just cast the integer and float to something that has more precision ;>

P.S. Did you know that there are exactly 75'497'471 positive integer values that can be precisely represented as a float? Not a lot for the total of 2'147'483'647 positive integers.

Articles

Comic