Newsgroups: comp.sys.cbm
Subject: Re: C64 Random Generator?
Summary:
Expires:
References: <6hqisi$qg6@examiner.concentric.net> <6ht6s2$327@news.acns.nwu.edu> <6huaed$g06@examiner.concentric.net>
Sender:
Followup-To:
Reply-To: sjudd@nwu.edu (Stephen Judd)
Distribution:
Organization: Northwestern University, Evanston, IL
Keywords:
Cc:
I will make one more shot at this :). Note that there is some code at
the end of this message.
In article <6huaed$g06@examiner.concentric.net>,
Cameron Kaiser wrote:
>judd@merle.acns.nwu.edu (Stephen Judd) writes:
>
>>>after only a few cycles. Apparently there are many such seeds, so this
>>>generator is effectively useless.
>
>>I read about this method in the paper "Equation of State Calculations by
>>Fast Computing Machines", by Nick Metropolis, Edward Teller, Marshall
>>Rosenbluth, Arianna Rosenbluth, and Augusta Teller (Journal Chem Phys.,
>>v21 No. 6, June 1953). "Useless" is a very strong adjective; therefore
>>I strongly disagree with it.
To be clear: I've no doubt that this method is sensitive to the intial
seed, and can fail in a number of situations. The argument is with
the statement that it is "useless". By my reckoning, if three of the
greatest scientists of the century can use it to invent Monte Carlo
methods, to run hydro codes, and to build great big bombs, well...
it's not useless! It just demands a lot of care.
Since you quoted from a book, I too located a book quote:
"...On the other hand, working with 38-bit numbers, Metropolis obtained a
sequence of about 750,000 numbers before degeracy occured, and the
resulting 750,000x38 bits satisfactorily passed statistical tests for
randomness. This shows that the middle-squares method _can_ give usable
results, but it is rather dangerous to put much faith in it until after
elaborate computations have been performed."
- Knuth, "The Art of Computer Programming" vol.2
Incidentally, Nick once told me that most people who knew his generation
of scientists -- including Feynman, Einstein, Fermi, etc. -- agreed that
von Neumann was the top of the heap.
>What I meant was not that x=1 is a good generator, but that I would consider
>an iterator that returned 1 over and over from some opaque function random,
>as long as you cannot prove apriori that the generator will return 1 the
>next time you run it. I would call that random because the algorithm is
>arbitrary even though the distribution sucks and it doesn't cover any sort
>of interval except 1. I think we agree basically though, which is good
>since you have the math degree and all. *smirk*
As to a degree, it's only a piece of paper and I never went to statistics
class anyways (that was the Semester of Nettrek). This stuff just
comes from reading a few books.
Now, let's talk about random numbers. I first want to demonstrate the
importance of talking about a sequence. Here is my demonstration:
3
Is the above number random?
It should be clear that it only makes sense to talk about sequences of
numbers. So now let's say I hand you a list of numbers -- is this a
_sequence_ of random numbers? That is a question we can answer, or at
least make progress on. We need a quantitative measure of randomness
(or else degenerate into a bunch of aristotelean "this is what I think
random means" mishmash).
Basically, no number should have anything to do with any other number
(they should be totally uncorrelated), and each number should occur with
a certain probability. For example, consider a good old 6-sided dice,
which is a pretty decent random number generator. In repeated tosses
you wouldn't expect the numbers to depend on each other, and after a
whole lot of tosses you would expect that each number would appear in
the sequence about the same number of times. (If you don't like dice,
then try roulette).
To further emphasize the importance of the distribution, consider the
types of things random numbers are used for: sampling/algorithm testing,
simulation, numerical analysis, generating numbers for the lottery
comission. If the random numbers don't have the expected distribution,
these tasks will all fail (and that last one would generate one happy
person and a whole lot of mad ones).
So, what makes a good random number generator on the computer? Clearly
one which generates sequences which adequately pass these tests.
(After all, I only handed you a sequence of numbers to test for
randomness -- I never said where they came from). To be of any
practical value, a computer random number generator should generate
an uncorrelated sequence of numbers with a uniform probability
distribution (other probability distributions are easy to form from
a uniform distribution). Rather, it should generate a sequence
which is "close enough" to an ideal sequence.
If you are really interested in pursuing the subject further, I suggest
reading Knuth, "The Art of Computer Progamming" vol. 2. It has 178
pages devoted to random numbers, including a chapter on the
philosophical considerations ("Precisely what do we mean by 'random
behavior'?"). In addition, "Numerical Recipies" has some more up to
date references.
--
Now, after all this talk, and claims from myself that things like swapping
bytes wreck a sequence, I thought it might be fun to test the BASIC RND
generator. So I wrote a little program, which is included below, which
performs three statistical tests of RND, as taken from Kunth vol. 2.
It is in BASIC, requires BLARG for the graphs, and sure is helped out
by a SuperCPU; BLARG is available at http://stratus.esam.nwu.edu/~judd/fridge/
The first two tests measure the distribution of the numbers. The
first test is the "Equidistribution test". It measures how well the
actual number distribution compares with a uniform distribution, by
using a Kolmogorov-Smirnov test. The K-S test is described below.
This test generates K-S statistics on a sequence of N (=1024 below)
random numbers. This is then repeated REP (=128) times. Finally,
a K-S analysis is performed on the REP K-S statistics. The short
1024-number sequences try to capture _locally_ nonrandom behavior;
analyzing the compiled statistics captures any additional
_globally_ nonrandom behavior.
The second test is the "Gap test". This test measures just how the
numbers are generated, and complements the equidistribution/frequency
test. The idea of the gap test is as follows: run through the
sequence, and count how many elements go by before you hit a
number between a and b. In my case, I chose a=0.25 and b=0.75;
the gap test just counts how long it takes to hit a number between
.25 and .75. Once a specified number of gaps have been counted,
a chi^2 statistic is formed from the gap lengths.
The gap test measures the "spread" of the numbers. If you had
a sequence like .9 .99 .91 .92 .95 .93 .5 or some such, it wouldn't
be very good -- although the numbers might cover the full range 0..1,
they bunch up in locations.
The third test is the "Correlation test". The correlation test
measures how the element u_{n+1} depends on the element u_n.
Correlation values near zero represent totally uncorrelated
numbers. Values near +/-1 imply linear dependence,
i.e. u_{n+1} = a + b*u_n.
K-S analysis: The K-S test compares a measured distribution Fn(x)
with an expected distribution F(x). The distribution in this
case is the probability that u_n < F(x). A uniform probability
distribution has F(x) = x, i.e. all numbers will be less than 1,
we expect half of the numbers to be less than .5, a quarter to
be less than .25, etc. The K-S test measures how far the
measured distribution deviates from the expected distribution,
using two statistics:
K+ = max (Fn(x) - F(x)) -- Measures the largest overshoots
K- = max (F(x) - Fn(x)) -- Measures the largest undershoots
So, how does RND do? Well, it does well on the correlation test.
Since it swaps bytes at every iterate, this is not at all surprising.
It fails the equidistribution test, though, pretty badly. The
measured K values consistently fall below the expected distribution,
and do so fairly consistently.
The real crusher, though, is the gap test. It really chokes on
this one. The chi^2 values not only have a large average value,
but they have an *enormous* standard deviation. For the gap test,
I measured gaps of length 0 1 2 3 and >=4, until 128 gaps were
counted. The expected numbers for the gaps are
gap length 0 1 2 3 4
number of gaps 64 32 16 8 8
The expected probability is just the number of gaps/128. The measured
gaps are all over the place. In the chi^2 distribution, a value
of 3.357 is the 50% mark for nu=4 -- that is, the chi^2 value will
be less than 3.357 about 50% of the time. Although it has such
a large deviation as to be meaningless, the average chi^2 value I
measured was 8.48 -- up near the 85% mark. That is, values should
get this high only 15% of the time.
Well, I think I computed and interpreted these statistics correctly.
Perhaps there is a statistician out there who can provide further
illumination.
evetS-
---
Source code and binaries follow.
Line 10 of the code contains the variables to modify. N is the length
of each mini-test. REP is the number of times to repeat each of these
tests. SEED is the random number seed, which I chose to be PI.
It requires BLARG for the graphics, available at
http://stratus.esam.nwu.edu/~judd/fridge/
and without a SuperCPU you would be well advised to take n and rep much
much smaller!
5 rem rnd test -- slj 4/98
7 poke 53374,.:poke53366,.:poke53375,.
9 mode17:print"compiling statistics..."
10 n=1024:rep=128:d=128:n2=1/n:seed={pi}
15 t=4
20 dim u(n),y%(n),f%(d),kp(rep),km(rep),c(rep),co(10),gap(rep)
30 x=rnd(-seed):n3=sqr(n)
35 print"i corr k+ k- gap"
40 for i=1 to rep
45 for j=1 to d:f%(j)=0:next:a1=0:a2=0:a3=0
50 for j=1 to n
60 u(j)=rnd(1):x=int(d*u(j)):y%(j)=x
65 for k=x to d:f%(k)=f%(k)+1:next k
67 a2=a2+u(j):a3=a3+u(j)*u(j)
70 next j
75 rem kolmogorov-smirnov test
76 :
80 m1=0:m2=0:for j=1 to d
90 x=f%(j)/n-j/d:if x>m1 then m1=x
100 if -x>m2 then m2=-x
110 next j
120 kp(i)=m1*n3:km(i)=m2*n3
125 rem serial correlation test
126 :
130 for j=1 to n-1:a1=a1+u(j)*u(j+1):next
140 a1=a1+u(n)*u(1)
150 c(i)=(n*a1-a2*a2)/(n*a3-a2*a2)
158 rem gap test, alpha=.25, beta=.75
159 :
160 gt=4:gn=128
161 j=-1:s=0:for k=0 to gt:co(k)=0:next
163 r=0
165 j=j+1:if (u(j)<0.25) or (u(j)>=0.75) and (j=gt then r=gt
175 co(r)=co(r)+1
180 s=s+1:if sm1 then m1=x
440 if -x>m2 then m2=-x
450 next k
460 print m1*sqr(rep)" kinf-="m2*sqr(rep)
470 print " press key see k+ graph" 480 get a$:if then480 490 mode16:gron20 500 m1="0:for" 319:x="2*i/319" 510 m2="m1:m1=0:for" j="1" 520 kp(j) m2 then m4=m4+1
1247 next
1250 x=rep-m3-m4
1260 print"no. of cm+ ="m4"="100*m4/rep"%"
1290 print"press l to list correlations,"
1300 print"any other key to continue"
1310 geta$:if a$="" then 1310
1320 if a$="l" then fork=1torep:printk,c(k):next
1330 return
ready.
begin 640 rand6
M`0@<"`4`CR!23D0@5$535"`M+2!33$H@-"\Y.``\"`<`ER`U,S,W-"PN.I__`!+4"A)*;)-,:Q.,SI+32A)*;)-,JQ.,P!?"GT`CR!3
M15))04P@0T]24D5,051)3TX@5$535`!E"GX`.@")"H(`@2!*LC$@I"!.JS$Z
M03&R03&J52A**:Q5*$JJ,2DZ@@"="HP`03&R03&J52A.*:Q5*#$I`,`*E@!#
M*$DILBA.K$$QJT$RK$$R*:TH3JQ!,ZM!,JQ!,BD`Y`J>`(\@1T%0(%1%4U0L
M($%,4$A!/2XR-2P@0D5403TN-S4`Z@J?`#H`^@J@`$=4LC0Z1TZR,3(X`!P+
MH0!*LJLQ.E.R,#J!($NR,""D($=4.D-/*$LILC`Z@@`D"Z,`4K(P`&$+I0!*
MLDJJ,3J+("A5*$HILS`N,C4I(+`@*%4H2BFQLC`N-S4I(*\@*$JS3BD@IR!2
MLE*J,3J)(#$V-0!T"ZH`BR!2L;)'5""G(%*R1U0`A@NO`$-/*%(ILD-/*%(I
MJC$`G0NT`%.R4ZHQ.HL@4[-'3B"G(#$V,P"V"[<`4+(P+C4Z6+(P.H$@2[(P
M(*0@1U0`U@NY`%*R0T\H2RFK1TZL4#I8LEBJ4JQ2K2A'3JQ0*0#K"[L`4+)0
MK3(Z@CI'05`H22FR6``+#+X`F2!).T,H22D[2U`H22D[2TTH22D[1T%0*$DI
M`!,,PP""($D`'0S(`.4Q-SKD`#D,S0"9.IDB4U1!5$E35$E#04P@5$535%,Z
M(@!;#-(`F2(@($$I($5154E$25-44DE"551)3TX@5$535"(`A`S<`)DB("!#
M*2!'05`@5$535"`H5$A)4R!)4R!33T]/($5!4UDI(@"C#.8`F2(@($HI(%-%
M4DE!3"!#3U)214Q!5$E/3B(`OPSP`)DB("`_*2!04DE.5"!#3TY35$%.5%,B
M`,\,^@"9(D-(3T]313HB.P#B#`0!H4$D.HM!)+(B(J!*U!$TQ
MLDU5JS*L4TE'34$Z33*R356J,JQ324=-00"0$K<$F2)-*R`](DTQ(B!-+2`]
M(")-,@"Z$KH$F2)%6%!%0U0@32T@/"!#(#P@32L@.34E($]&(%1(12!424U%
M(@#+$L0$F2)!0U1504PZ("([`.42S@1-,[(P.DTTLC`Z@2!*LC$@I"!215``
M_Q+8!(L@0RA**2"S($TQ(*<@33.R33.J,0`9$]T$BR!#*$HI(+$@33(@IR!-
M-+)--*HQ`!\3WP2"`"\3X@18LE)%4*M-,ZM--`!6$^P$F2).3RX@3T8@0SQ-
M+2`](DTS(CTB,3`PK$TSK5)%4"(E(@!^$_8$F2).3RX@3T8@32T\0SQ-*R`]
M(E@B/2(Q,#"L6*U215`B)2(`I1,`!9DB3D\N($]&($,^32L@/2)--"(](C$P
M,*Q--*U215`B)2(`RA,*!9DB4%)%4U,@3"!43R!,25-4($-/4E)%3$%424].
M4RPB`.L3%`69(D%.62!/5$A%4B!+15D@5$\@0T].5$E.544B``(4'@6A020Z
MBR!!)+(B(B"G(#$S,3``)!0H!8L@022R(DPB(*<@@4NR,:1215`ZF4LL0RA+
,*3J"`"H4,@6.````
`
end
__