The recent mass distribution of cards containing pictures of Pokémon characters presents some interesting probability questions.
In a recent SMILE class Pat Phillip [Arai Middle School] posed the question of how many Pokémon Cards [given randomly and with equal probability] one would have to get to have "reasonable assurance" of having a complete set of 6 cards. We drew a few cards during the class, and we able to confirm the claim that the "average" number drawn would be about 14, and that you would generally have to get several of the same kind before acquiring a full set.
The initial enthusiasm of getting the cards wore off after a few trials.
I decided to "automate the process" by writing a computer program
to do a significant number of "random trials". Here, for your potential
amusement, is the FORTRAN program to make and record these trials:
c pokemon cards
c monte carlo simulation
c number of "draws" to get a complete set.
dimension icard(6),ntimes(100), ncum(0:100)
open(unit=8,file='poke.dat',status='unknown')
c set up random number generator
nrand = 11
call seed(nrand)
c initializing counter of games to zero
do 10 n = 1,100
10 ntimes(n) = 0
c input: number of games [complete sets] to get
write(6,*)' how many games do you wish to play?'
read(5,*) kgames
write(8,88) kgames
88 format(' num games = ', i10)
c playing the games
do 100 k = 1,kgames
c k is the game number
do 40 i = 1,6
40 icard(i) = 0
c playing a particular game
do 60 n = 1,100
c n is the number of cards in a given game
call random(x)
xx = 6.*x +1.
j = int(xx)
c j is the card number received on nth draw
icard(j) = icard(j) +1
c icard(j) is number of times jth card has appeared
t = icard(1)*icard(2)*icard(3)*icard(4)*icard(5)*icard(6)
c testing for a complete set: t is not zero
if (t.gt.0) then
c complete set is obtained; go to next game
ntimes(n) = ntimes(n)+1
goto 100
end if
c end of a particular game
60 continue
c if more than 100 games are required to get a match, we stop
stop
100 continue
c recording of
write(8,*) ' number of draws for a match'
write(8,99) (ntimes(n), n = 1,100)
99 format(10i8)
c computing mean and standard deviation of number of draws
sum1 = 0.
sum2 = 0.
do 200 n = 1,100
fn = float(n)
fnum = float(ntimes(n))
sum1 = sum1 + fnum*fn
sum2 = sum2 + fnum*fn**2
200 continue
fgames = float(kgames)
fmean = sum1/fgames
fmeansq = sum2/fgames
stddev = sqrt(fmeansq-fmean**2)
write(8,999) fmean, fmeansq, stddev
999 format(' mean =', f10.6, /, ' meansq =', f10.6, /,
> ' stddev =', f10.6)
c ncum(j) is probability of getting a complete set after n games
ncum(0) = 0
do 300 n = 1,100
300 ncum(n) = ncum(n-1)+ntimes(n)
c print of ncum(j)
write(8,*) ' cumulative hit number'
write(8,99) (ncum(n), n = 1,100)
c quit and go home
stop
end
num games = 10000000
number of draws for a match
0 0 0 0 0 154776 385006 600803 750090 825776
843029 816335 762255 690521 613102 538025 466376 400611 342242 290821
245911 206988 174815 147081 122821 102982 85649 71363 60185 50199
41792 35076 29128 24394 20305 16716 14039 11830 9921 8163
6622 5673 4692 3953 3353 2869 2282 1943 1591 1315
1073 924 750 634 548 488 368 298 230 184
184 150 132 99 88 66 54 51 51 35
20 24 26 11 16 12 14 12 9 2
3 3 2 8 1 4 1 0 2 2
0 0 0 0 1 0 0 0 0 1
mean = 14.701120
meansq =255.126600
stddev = 6.245292
cumulative hit number
0 0 0 0 0 154776 539782 1140585 1890675 2716451
3559480 4375815 5138070 5828591 6441693 6979718 7446094 7846705 8188947 8479768
8725679 8932667 9107482 9254563 9377384 9480366 9566015 9637378 9697563 9747762
9789554 9824630 9853758 9878152 9898457 9915173 9929212 9941042 9950963 9959126
9965748 9971421 9976113 9980066 9983419 9986288 9988570 9990513 9992104 9993419
9994492 9995416 9996166 9996800 9997348 9997836 9998204 9998502 9998732 9998916
9999100 9999250 9999382 9999481 9999569 9999635 9999689 9999740 9999791 9999826
9999846 9999870 9999896 9999907 9999923 9999935 9999949 9999961 9999970 9999972
9999975 9999978 9999980 9999988 9999989 9999993 9999994 9999994 9999996 9999998
9999998 9999998 9999998 9999998 9999999 9999999 9999999 9999999 999999910000000
The Monte Carlo approach to solving was first employed in large-scale digital problems shortly after World War II by an applied mathematician active in the Manhattan Project Stanislaw Ulam, shortly after he recovered from a very serious illness. There is no better introduction to the subject than this paragraph, taken from his autobiography, Adventures of a Mathematician, [U California Press 1991; ISBN 0-520-07154-9] pp 196-197.
The idea for what was later called the Monte Carlo method occurred to me when I was playing Solitaire during my illness. I noticed that it may be much more practical to get an idea of the probability of the successful outcome of a Solitaire game (like Canfield or some other where the skill of the player is not important) by laying down the cards, or experimenting with the process, and merely noticing what proportion comes out successfully; rather than to try to compute all the combinatorial possibilities which are an exponentially increasing number so great that, except in very elementary cases, there is no way to estimate it. This is intellectually surprising, and if not exactly humiliating, it gives one a feeling of modesty about the limits of rational or traditional thinking. In a sufficiently complicated problem, actual sampling is better than examining all the chains of possibilities.
How do we understand these numbers, and how do we begin to compute the odds when the chances of drawing cards are not the same, as is surely the case for Pokémon cards given with Happy Meals at McDonalds?
Let us suppose that the Pokémon cards #1, #2, #3, #4, #5, #6 are generated with probabilities p1, p2, p3, p4, p5, p6, respectively. Each probability pi is positive, and the total probabilities must add to 1:
It is also true that
It is remarkable, but true, that if you write out the algebraic expression for
as a sum of monomials in pi, the probability that, in N draws, you get card #1 n1 times; card #2 n2 times; card #3 n3 times; card #4 n4 times; card #5 n5 times; and card #6 n6 times, respectively, is given by the monomial term occurring in the above expression; ie
When the dust clears, we obtain the following expression for the probability of having a complete set after N trials:
In this expression the first line is complete; in the second line there are 6 terms that involve any five of the pi; in the third line there are 10 terms that involve any four of the pi; in the fourth line there are 15 terms that involve any three of the pi; in the fifth line there are 10 terms that involve any two of the pi; and in the sixth line there are 6 terms involving each of the pi. That is the answer, no matter what N is and what the individual pi happen to be.
We can simplify the answer considerably for the case in which the 6 cards are equally probable, by setting pi = 1/6. We obtain that, for any N,
I have written an EXCEL program to determine these numbers and to draw a graph, which is shown below: The numbers agree with those obtained in the Monte Carlo Simulation, to about four significant figures. Here is a listing of the numbers for small values of N:
| N | P(N) | N | P(N) | N | P(N) | N | P(N) | |||
| 1 | .0000 | 11 | .3562 | 21 | .8726 | 31 | .9790 | |||
| 2 | .0000 | 12 | .4378 | 22 | .8933 | 32 | .9825 | |||
| 3 | .0000 | 13 | .5139 | 23 | .9108 | 33 | .9854 | |||
| 4 | .0000 | 14 | .5828 | 24 | .9254 | 34 | .9878 | |||
| 5 | .0000 | 11 | .6442 | 25 | .9388 | 35 | .9899 | |||
| 6 | .0154 | 16 | .6980 | 26 | .9480 | 36 | .9915 | |||
| 7 | .0540 | 17 | .7446 | 27 | .9566 | 37 | .9930 | |||
| 8 | .1140 | 18 | .7847 | 28 | .9638 | 38 | .9941 | |||
| 9 | .1890 | 19 | .8189 | 29 | .9698 | 39 | .9950 | |||
| 10 | .2718 | 20 | .8480 | 30 | .9748 | 40 | .9959 |