The pitfall of using rand() for simulation


This was probably the first simulation I did. How naive I was!

Today I was doing some simulation and I need a uniform random bits stream. What I tried first is to use the standard C function rand(), which should return a uniform integer between 1 and 2^32. So I thought I could just use rand() to generate an integer and use it as 32 uniform random bits. But then I noticed the behavior of my simulation is a bit off my expectation. In the end I wrote the following program to count how many 0's and 1's I get when I use rand() many times.

{% highlight c %} #include #include #include #include

#define INT_BITS (8 * sizeof(int))


int main(void)
{
    srand(time(0)); //use current time as seed for random generator

    int random_variable;

    unsigned char bit;

    unsigned long int k, n, total_bits, zeros;
    double zero_ratio;

    zeros = 0;

    for(n = 0; n < 1e7; n++)
    {
        for(k = 0; k < INT_BITS; k++)
        {
            random_variable = rand();
            bit = (random_variable >> k) & 1;
            if(bit == 0)
            {
                zeros += 1;
            }
        }
    }

    total_bits = INT_BITS * n;
    zero_ratio = (float)zeros/total_bits;

    //printf("RAND_MAX %d\n", RAND_MAX);
    printf("total zeros %lu total bits %lu average %f\n", zeros, total_bits, zero_ratio);
}

{% endhighlight %}

The result is pretty clear on my machine.

total zeros 164992607 total bits 320000000 average 0.515602

I consistently got more zeros than ones. And this bias does not vanish when I increased size. In the end I switched to Mersenne Twister pseudo-random generator. Works much better!