[vox-tech] random number question

Mark K. Kim vox-tech@lists.lugod.org
Wed, 15 May 2002 11:44:21 -0700 (PDT)


On Wed, 15 May 2002 msimons@moria.simons-clan.com wrote:

> > and i assume it's not true across different computers?
>
>   False, sequences will be the same across machines with the same CPU (1)
> using the same algorithm.
>
> 1: The main question is that the algorithm be the same, for integer
>    based algorithms (which I think rand is) the CPU may not mater at all.
>    Just don't count on being able to generate the same sequence between
>    a sparc, alpha, and intel line of processor until you test it.

It might be worth while to explicitely mention that if you compile a
program using two different compilers on the same system, you may get
different results.  For example, if you compile a program with CC vs. gcc
on SGI.  This, of course, is because you may end up using different
libraries using different algorithms, as Mike says.

>   The c library rand and random functions produce a sequence of
> pseudo-random numbers based on an initial seed value.  This means
> that while the sequence itself is not random (every single time
> the algorithm is started with a seed of X it will spit out certain
> numbers in a certain order).

I always thought this was very clever, because if you're running a
simulation you can always repeat the last simulation simply by
remembering a single number (the seed).

On a slightly different note, I think seeding with time should be done
automatically in scripting languages.  In almost all programs utilizing
random numbers require this step anyway, and since scripting languages are
supposed to make your life easier it shouldn't require you to lookup how
to generate the current time in seconds since the epoch everytime you want
to use the random number.  But sometimes you want manual seeding, so it
should still provide a way to set the seed manually if necessary.

But that's just me.

>   These algorithms also tend to "cycle" back upon themselves, so
> if they spit out (10, 5, 8, 20, ...) sometime millions of reads
> later the same millions of numbers will be pulled in the same order.

If you look at the textbook random number generator (which reminds me of a
story... I'll tell you in a bit), you realize you can actually make a very
good use of this idea.  For example, if you want to play all the songs on
your CD (or MP3 library, or whatever) all exactly once in random order
before playing the same song again (so you don't get the same song playing
twice in a row), you can simply manipulate the random number generator
slightly to do that.  It can be very useful, but unfortunately not many
programs use this technique.

As for that textbook random number generator, I can't quite recall what it
is, and I don't have any algorithms book with me.  But it reminds me of
this one time I had ECS110 with Nicole and one day she frantically(?)
asked people in the class if anyone remembered the random generator
algorithm and nobody did.  Heh...  It's such a simple algorithm, too; it's
a shame not being able to remember it...

>   I think each sequence is completely independent of the others...
> in that the order of numbers in one cycle of all sequences will be
> different.

It generates the numbers in the same sequence with the algorithm I'm
thinking about, but I'm sure there are better algorithms implemented in
the C library.  It's simple enough to detect a cycle and reseed the random
number generator, anyway.

>   As I said to Mark last time stuff about rand/random came up.  If
> you want lots of non-reproducible random numbers you should open
> up the /dev/urandom device and read bytes from there... from read to
> read you will get randomized stuff.  If you are doing something
> cryptographically sensitive then open /dev/random, but don't try to
> read much data from that device it will block readers until it
> has enough "collected entropy"...

Yeah, thank you for pointing that out.  It was a very good information to
have.

BTW, some people like to seed the algorithm before every generation of
random number.  Something about being more random...  It can be
problematic if you generate a new random number in less than a second (if
you reseed using the time), but there are ways to deal with that, too (ie
- some sort of hash using the last random number and the current time,
for example).

-Mark

--
Mark K. Kim
http://www.cbreak.org/
PGP key available upon request.