Archive

Archive for July, 2010

When True Is Not True Anymore

July 18, 2010 13 comments

We all know that accessing uninitialized variables in C and C++ usually leads to some kind of undefined behavior one usually wants to avoid. What I didn’t know until recently is that uninitialized bool values might be especially malicious beasts. To see what they can do to you, take a look at the following program, and try to predict its output:

#include <string>
#include <iostream>

using namespace std;

namespace {

    inline string stringify(const bool value)
    {
        return (value ? "true" : "false");
    }

    struct Struct
    {
        long l;
        bool u;
    };
}

int main()
{
    Struct s;

    if(true != s.u)
        cout << stringify(true) << " != " << stringify(s.u) << endl;
}

Now type g++ stringify.cc -o stringify and run the generated executable. Here is what you might see on some platforms:

$ ./stringify 
true != true

Yes, you got that right true != true! I get this behaviour with g++-4.3 and g++-4.4 on Gentoo (x86 and x86_64) as well as with g++-4.1 and g++-4.4 on Ubuntu 9.10 (x86_64). Before attempting to explain what happened here, I want to summarize a few additional facts:

  • Several attempts to make this program shorter without ending up with something boring failed.
  • Turning on optimization causes g++ optimize all if statements away (especially the implicit one in stringify) under the assumption that s.u is false, which again leads to a much more sane output.
  • I could not reproduce this with icc-11.1.

So, what happened? Is g++ broken? Actually the answer is no. Accessing uninitialized memory leads to undefined behavior, and undefined means undefined. In fact the C++0x Final Committee Draft contains a footnote that explicitly mentions the oddity we have just seen:

47) Using a bool value in ways described by this International Standard as “undefined,” such as by examining the value of an
uninitialized automatic object, might cause it to behave as if it is neither true nor false.

This is not that surprising if one considers that at assembler level, a bool is not represented by a single bit, but at least by a byte. An uninitialized byte might have 256 different values, and not just two. One could of course consistently map 0 to false and everything else to true, but this is not what g++ does. To see what I mean, take a look at the following assembler snippet, that g++ generated for the if statement in line 24:

movzbl  -40(%rbp), %eax  # move s.u to eax.
xorl    $1, %eax         # xor eax with 1.
testb	%al, %al         # check if the low byte of eax is 0.
je      .L8              # jump to .L8 if so.

If the jump is taken, the body of the if statement in line 24 is skipped, otherwise it is executed. Now the xorl in line 2 switches the lowest bit in eax, leaving all other bits unchanged. Therefore s.u is considered to be equal to true if and only if it has the byte value 0x01.

Now lets take a look at the assembler that represents the ternary operator in stringify:

cmpb $0, -36(%rbp)    # compare the argument with 0.
je   .L2              # jump to .L2 if the argument is 0.
movl $.LC0, %eax      # store "true" in eax.
jmp  .L3              # jump out.
.L2:
     movl $.LC1, %eax # store "false" in eax.
.L3:

Here g++ maps 0 to false and every other value to true. This means that if the actual byte value of s.u is for example 0xFF (which for some reason is what cgdb keeps telling me locally), the if in line 24 will be taken as if s.u was false, but stringify will behave as if s.u was true.

Advertisements
Categories: Programming Tags: