Newsgroups: comp.sys.cbm Subject: Re: C64 Random Generator? Summary: Expires: References: <6hqisi$qg6@examiner.concentric.net> <6ht6s2$327@news.acns.nwu.edu> <6huaed$g06@examiner.concentric.net> Sender: Followup-To: Reply-To: sjudd@nwu.edu (Stephen Judd) Distribution: Organization: Northwestern University, Evanston, IL Keywords: Cc: I will make one more shot at this :). Note that there is some code at the end of this message. In article <6huaed$g06@examiner.concentric.net>, Cameron Kaiser wrote: >judd@merle.acns.nwu.edu (Stephen Judd) writes: > >>>after only a few cycles. Apparently there are many such seeds, so this >>>generator is effectively useless. > >>I read about this method in the paper "Equation of State Calculations by >>Fast Computing Machines", by Nick Metropolis, Edward Teller, Marshall >>Rosenbluth, Arianna Rosenbluth, and Augusta Teller (Journal Chem Phys., >>v21 No. 6, June 1953). "Useless" is a very strong adjective; therefore >>I strongly disagree with it. To be clear: I've no doubt that this method is sensitive to the intial seed, and can fail in a number of situations. The argument is with the statement that it is "useless". By my reckoning, if three of the greatest scientists of the century can use it to invent Monte Carlo methods, to run hydro codes, and to build great big bombs, well... it's not useless! It just demands a lot of care. Since you quoted from a book, I too located a book quote: "...On the other hand, working with 38-bit numbers, Metropolis obtained a sequence of about 750,000 numbers before degeracy occured, and the resulting 750,000x38 bits satisfactorily passed statistical tests for randomness. This shows that the middle-squares method _can_ give usable results, but it is rather dangerous to put much faith in it until after elaborate computations have been performed." - Knuth, "The Art of Computer Programming" vol.2 Incidentally, Nick once told me that most people who knew his generation of scientists -- including Feynman, Einstein, Fermi, etc. -- agreed that von Neumann was the top of the heap. >What I meant was not that x=1 is a good generator, but that I would consider >an iterator that returned 1 over and over from some opaque function random, >as long as you cannot prove apriori that the generator will return 1 the >next time you run it. I would call that random because the algorithm is >arbitrary even though the distribution sucks and it doesn't cover any sort >of interval except 1. I think we agree basically though, which is good >since you have the math degree and all. *smirk* As to a degree, it's only a piece of paper and I never went to statistics class anyways (that was the Semester of Nettrek). This stuff just comes from reading a few books. Now, let's talk about random numbers. I first want to demonstrate the importance of talking about a sequence. Here is my demonstration: 3 Is the above number random? It should be clear that it only makes sense to talk about sequences of numbers. So now let's say I hand you a list of numbers -- is this a _sequence_ of random numbers? That is a question we can answer, or at least make progress on. We need a quantitative measure of randomness (or else degenerate into a bunch of aristotelean "this is what I think random means" mishmash). Basically, no number should have anything to do with any other number (they should be totally uncorrelated), and each number should occur with a certain probability. For example, consider a good old 6-sided dice, which is a pretty decent random number generator. In repeated tosses you wouldn't expect the numbers to depend on each other, and after a whole lot of tosses you would expect that each number would appear in the sequence about the same number of times. (If you don't like dice, then try roulette). To further emphasize the importance of the distribution, consider the types of things random numbers are used for: sampling/algorithm testing, simulation, numerical analysis, generating numbers for the lottery comission. If the random numbers don't have the expected distribution, these tasks will all fail (and that last one would generate one happy person and a whole lot of mad ones). So, what makes a good random number generator on the computer? Clearly one which generates sequences which adequately pass these tests. (After all, I only handed you a sequence of numbers to test for randomness -- I never said where they came from). To be of any practical value, a computer random number generator should generate an uncorrelated sequence of numbers with a uniform probability distribution (other probability distributions are easy to form from a uniform distribution). Rather, it should generate a sequence which is "close enough" to an ideal sequence. If you are really interested in pursuing the subject further, I suggest reading Knuth, "The Art of Computer Progamming" vol. 2. It has 178 pages devoted to random numbers, including a chapter on the philosophical considerations ("Precisely what do we mean by 'random behavior'?"). In addition, "Numerical Recipies" has some more up to date references. -- Now, after all this talk, and claims from myself that things like swapping bytes wreck a sequence, I thought it might be fun to test the BASIC RND generator. So I wrote a little program, which is included below, which performs three statistical tests of RND, as taken from Kunth vol. 2. It is in BASIC, requires BLARG for the graphs, and sure is helped out by a SuperCPU; BLARG is available at http://stratus.esam.nwu.edu/~judd/fridge/ The first two tests measure the distribution of the numbers. The first test is the "Equidistribution test". It measures how well the actual number distribution compares with a uniform distribution, by using a Kolmogorov-Smirnov test. The K-S test is described below. This test generates K-S statistics on a sequence of N (=1024 below) random numbers. This is then repeated REP (=128) times. Finally, a K-S analysis is performed on the REP K-S statistics. The short 1024-number sequences try to capture _locally_ nonrandom behavior; analyzing the compiled statistics captures any additional _globally_ nonrandom behavior. The second test is the "Gap test". This test measures just how the numbers are generated, and complements the equidistribution/frequency test. The idea of the gap test is as follows: run through the sequence, and count how many elements go by before you hit a number between a and b. In my case, I chose a=0.25 and b=0.75; the gap test just counts how long it takes to hit a number between .25 and .75. Once a specified number of gaps have been counted, a chi^2 statistic is formed from the gap lengths. The gap test measures the "spread" of the numbers. If you had a sequence like .9 .99 .91 .92 .95 .93 .5 or some such, it wouldn't be very good -- although the numbers might cover the full range 0..1, they bunch up in locations. The third test is the "Correlation test". The correlation test measures how the element u_{n+1} depends on the element u_n. Correlation values near zero represent totally uncorrelated numbers. Values near +/-1 imply linear dependence, i.e. u_{n+1} = a + b*u_n. K-S analysis: The K-S test compares a measured distribution Fn(x) with an expected distribution F(x). The distribution in this case is the probability that u_n < F(x). A uniform probability distribution has F(x) = x, i.e. all numbers will be less than 1, we expect half of the numbers to be less than .5, a quarter to be less than .25, etc. The K-S test measures how far the measured distribution deviates from the expected distribution, using two statistics: K+ = max (Fn(x) - F(x)) -- Measures the largest overshoots K- = max (F(x) - Fn(x)) -- Measures the largest undershoots So, how does RND do? Well, it does well on the correlation test. Since it swaps bytes at every iterate, this is not at all surprising. It fails the equidistribution test, though, pretty badly. The measured K values consistently fall below the expected distribution, and do so fairly consistently. The real crusher, though, is the gap test. It really chokes on this one. The chi^2 values not only have a large average value, but they have an *enormous* standard deviation. For the gap test, I measured gaps of length 0 1 2 3 and >=4, until 128 gaps were counted. The expected numbers for the gaps are gap length 0 1 2 3 4 number of gaps 64 32 16 8 8 The expected probability is just the number of gaps/128. The measured gaps are all over the place. In the chi^2 distribution, a value of 3.357 is the 50% mark for nu=4 -- that is, the chi^2 value will be less than 3.357 about 50% of the time. Although it has such a large deviation as to be meaningless, the average chi^2 value I measured was 8.48 -- up near the 85% mark. That is, values should get this high only 15% of the time. Well, I think I computed and interpreted these statistics correctly. Perhaps there is a statistician out there who can provide further illumination. evetS- --- Source code and binaries follow. Line 10 of the code contains the variables to modify. N is the length of each mini-test. REP is the number of times to repeat each of these tests. SEED is the random number seed, which I chose to be PI. It requires BLARG for the graphics, available at http://stratus.esam.nwu.edu/~judd/fridge/ and without a SuperCPU you would be well advised to take n and rep much much smaller! 5 rem rnd test -- slj 4/98 7 poke 53374,.:poke53366,.:poke53375,. 9 mode17:print"compiling statistics..." 10 n=1024:rep=128:d=128:n2=1/n:seed={pi} 15 t=4 20 dim u(n),y%(n),f%(d),kp(rep),km(rep),c(rep),co(10),gap(rep) 30 x=rnd(-seed):n3=sqr(n) 35 print"i corr k+ k- gap" 40 for i=1 to rep 45 for j=1 to d:f%(j)=0:next:a1=0:a2=0:a3=0 50 for j=1 to n 60 u(j)=rnd(1):x=int(d*u(j)):y%(j)=x 65 for k=x to d:f%(k)=f%(k)+1:next k 67 a2=a2+u(j):a3=a3+u(j)*u(j) 70 next j 75 rem kolmogorov-smirnov test 76 : 80 m1=0:m2=0:for j=1 to d 90 x=f%(j)/n-j/d:if x>m1 then m1=x 100 if -x>m2 then m2=-x 110 next j 120 kp(i)=m1*n3:km(i)=m2*n3 125 rem serial correlation test 126 : 130 for j=1 to n-1:a1=a1+u(j)*u(j+1):next 140 a1=a1+u(n)*u(1) 150 c(i)=(n*a1-a2*a2)/(n*a3-a2*a2) 158 rem gap test, alpha=.25, beta=.75 159 : 160 gt=4:gn=128 161 j=-1:s=0:for k=0 to gt:co(k)=0:next 163 r=0 165 j=j+1:if (u(j)<0.25) or (u(j)>=0.75) and (j=gt then r=gt 175 co(r)=co(r)+1 180 s=s+1:if s m2 then m4=m4+1 1247 next 1250 x=rep-m3-m4 1260 print"no. of cm+ ="m4"="100*m4/rep"%" 1290 print"press l to list correlations," 1300 print"any other key to continue" 1310 geta$:if a$="" then 1310 1320 if a$="l" then fork=1torep:printk,c(k):next 1330 return ready. begin 640 rand6 M`0@<"`4`CR!23D0@5$535"`M+2!33$H@-"\Y.``\"`<`ER`U,S,W-"PN.I`!+4"A)*;)-,:Q.,SI+32A)*;)-,JQ.,P!?"GT`CR!3 M15))04P@0T]24D5,051)3TX@5$535`!E"GX`.@")"H(`@2!*LC$@I"!.JS$Z M03&R03&J52A**:Q5*$JJ,2DZ@@"="HP`03&R03&J52A.*:Q5*#$I`,`*E@!# M*$DILBA.K$$QJT$RK$$R*:TH3JQ!,ZM!,JQ!,BD`Y`J>`(\@1T%0(%1%4U0L M($%,4$A!/2XR-2P@0D5403TN-S4`Z@J?`#H`^@J@`$=4LC0Z1TZR,3(X`!P+ MH0!*LJLQ.E.R,#J!($NR,""D($=4.D-/*$LILC`Z@@`D"Z,`4K(P`&$+I0!* MLDJJ,3J+("A5*$HILS`N,C4I(+`@*%4H2BFQLC`N-S4I(*\@*$JS3BD@IR!2 MLE*J,3J)(#$V-0!T"ZH`BR!2L;)'5""G(%*R1U0`A@NO`$-/*%(ILD-/*%(I MJC$`G0NT`%.R4ZHQ.HL@4[-'3B"G(#$V,P"V"[<`4+(P+C4Z6+(P.H$@2[(P M(*0@1U0`U@NY`%*R0T\H2RFK1TZL4#I8LEBJ4JQ2K2A'3JQ0*0#K"[L`4+)0 MK3(Z@CI'05`H22FR6``+#+X`F2!).T,H22D[2U`H22D[2TTH22D[1T%0*$DI M`!,,PP""($D`'0S(`.4Q-SKD`#D,S0"9.IDB4U1!5$E35$E#04P@5$535%,Z M(@!;#-(`F2(@($$I($5154E$25-44DE"551)3TX@5$535"(`A`S<`)DB("!# M*2!'05`@5$535"`H5$A)4R!)4R!33T]/($5!4UDI(@"C#.8`F2(@($HI(%-% M4DE!3"!#3U)214Q!5$E/3B(`OPSP`)DB("`_*2!04DE.5"!#3TY35$%.5%,B M`,\,^@"9(D-(3T]313HB.P#B#`0!H4$D.HM!)+(B(J!*U!$TQ MLDU5JS*L4TE'34$Z33*R356J,JQ324=-00"0$K<$F2)-*R`](DTQ(B!-+2`] M(")-,@"Z$KH$F2)%6%!%0U0@32T@/"!#(#P@32L@.34E($]&(%1(12!424U% M(@#+$L0$F2)!0U1504PZ("([`.42S@1-,[(P.DTTLC`Z@2!*LC$@I"!215`` M_Q+8!(L@0RA**2"S($TQ(*<@33.R33.J,0`9$]T$BR!#*$HI(+$@33(@IR!- M-+)--*HQ`!\3WP2"`"\3X@18LE)%4*M-,ZM--`!6$^P$F2).3RX@3T8@0SQ- M+2`](DTS(CTB,3`PK$TSK5)%4"(E(@!^$_8$F2).3RX@3T8@32T\0SQ-*R`] M(E@B/2(Q,#"L6*U215`B)2(`I1,`!9DB3D\N($]&($,^32L@/2)--"(](C$P M,*Q--*U215`B)2(`RA,*!9DB4%)%4U,@3"!43R!,25-4($-/4E)%3$%424]. M4RPB`.L3%`69(D%.62!/5$A%4B!+15D@5$\@0T].5$E.544B``(4'@6A020Z MBR!!)+(B(B"G(#$S,3``)!0H!8L@022R(DPB(*<@@4NR,:1215`ZF4LL0RA+ ,*3J"`"H4,@6.```` ` end