Porter Johnson

The recent mass distribution of cards containing pictures of Pokémon characters presents some interesting probability questions.

In a recent SMILE class **Pat Phillip [Arai Middle School]** posed
the question of how many __Pok__é__mon Cards__ [given randomly and with
equal probability] one would have to get to have "reasonable assurance"
of having a complete set of 6 cards. We drew a few cards during the class,
and we able to confirm the claim that the "average" number drawn would be
about 14, and that you would generally have to get several of the same
kind before acquiring a full set.

The initial enthusiasm of getting the cards wore off after a few trials.
I decided to "automate the process" by writing a computer program
to do a significant number of "random trials". Here, for your potential
amusement, is the FORTRAN program to make and record these trials:

c pokemon cards c monte carlo simulation c number of "draws" to get a complete set. dimension icard(6),ntimes(100), ncum(0:100) open(unit=8,file='poke.dat',status='unknown') c set up random number generator nrand = 11 call seed(nrand) c initializing counter of games to zero do 10 n = 1,100 10 ntimes(n) = 0 c input: number of games [complete sets] to get write(6,*)' how many games do you wish to play?' read(5,*) kgames write(8,88) kgames 88 format(' num games = ', i10) c playing the games do 100 k = 1,kgames c k is the game number do 40 i = 1,6 40 icard(i) = 0 c playing a particular game do 60 n = 1,100 c n is the number of cards in a given game call random(x) xx = 6.*x +1. j = int(xx) c j is the card number received on nth draw icard(j) = icard(j) +1 c icard(j) is number of times jth card has appeared t = icard(1)*icard(2)*icard(3)*icard(4)*icard(5)*icard(6) c testing for a complete set: t is not zero if (t.gt.0) then c complete set is obtained; go to next game ntimes(n) = ntimes(n)+1 goto 100 end if c end of a particular game 60 continue c if more than 100 games are required to get a match, we stop stop 100 continue c recording of write(8,*) ' number of draws for a match' write(8,99) (ntimes(n), n = 1,100) 99 format(10i8) c computing mean and standard deviation of number of draws sum1 = 0. sum2 = 0. do 200 n = 1,100 fn = float(n) fnum = float(ntimes(n)) sum1 = sum1 + fnum*fn sum2 = sum2 + fnum*fn**2 200 continue fgames = float(kgames) fmean = sum1/fgames fmeansq = sum2/fgames stddev = sqrt(fmeansq-fmean**2) write(8,999) fmean, fmeansq, stddev 999 format(' mean =', f10.6, /, ' meansq =', f10.6, /, > ' stddev =', f10.6) c ncum(j) is probability of getting a complete set after n games ncum(0) = 0 do 300 n = 1,100 300 ncum(n) = ncum(n-1)+ntimes(n) c print of ncum(j) write(8,*) ' cumulative hit number' write(8,99) (ncum(n), n = 1,100) c quit and go home stop end

Here is output for a run with ten million "games". In each game I told the computer to keep drawing until it got a complete set. A record of the number N of draws needed for each set is given, as well as the cumulative hit numbers [number of times that N draws or fewer is required to obtain a complete set]:

num games = 10000000 number of draws for a match 0 0 0 0 0 154776 385006 600803 750090 825776 843029 816335 762255 690521 613102 538025 466376 400611 342242 290821 245911 206988 174815 147081 122821 102982 85649 71363 60185 50199 41792 35076 29128 24394 20305 16716 14039 11830 9921 8163 6622 5673 4692 3953 3353 2869 2282 1943 1591 1315 1073 924 750 634 548 488 368 298 230 184 184 150 132 99 88 66 54 51 51 35 20 24 26 11 16 12 14 12 9 2 3 3 2 8 1 4 1 0 2 2 0 0 0 0 1 0 0 0 0 1 mean = 14.701120 meansq =255.126600 stddev = 6.245292 cumulative hit number 0 0 0 0 0 154776 539782 1140585 1890675 2716451 3559480 4375815 5138070 5828591 6441693 6979718 7446094 7846705 8188947 8479768 8725679 8932667 9107482 9254563 9377384 9480366 9566015 9637378 9697563 9747762 9789554 9824630 9853758 9878152 9898457 9915173 9929212 9941042 9950963 9959126 9965748 9971421 9976113 9980066 9983419 9986288 9988570 9990513 9992104 9993419 9994492 9995416 9996166 9996800 9997348 9997836 9998204 9998502 9998732 9998916 9999100 9999250 9999382 9999481 9999569 9999635 9999689 9999740 9999791 9999826 9999846 9999870 9999896 9999907 9999923 9999935 9999949 9999961 9999970 9999972 9999975 9999978 9999980 9999988 9999989 9999993 9999994 9999994 9999996 9999998 9999998 9999998 9999998 9999998 9999999 9999999 9999999 9999999 999999910000000

We see that the mean number of draws is about 14.70, whereas the spread in that mean is about 6.25, so that between 7 and 21 draws are required to get a full set most of the time. In fact, it is about 5.4 % probable to have a full set after 7 draws, about 51.4 % probable to have a full set after 13 draws, and about 87.3 % probable to have the set after 21 draws. However, the distribution of draws is quite "skewed", and the most likely number of draws required is 11, which occurs about 8.4% of the time. Of the ten million computer-generated games, one of the games required 100 draws to get a full set of cards.

The Monte Carlo approach to solving was first employed in large-scale
digital problems shortly after World War II by an applied mathematician
active in the Manhattan Project
Stanislaw Ulam, shortly after he
recovered from a very serious illness. There is no better introduction
to the subject than this paragraph, taken from his autobiography,
__Adventures of a Mathematician__, [U California Press 1991;
ISBN 0-520-07154-9] pp 196-197.

**
The idea for what was later called the Monte Carlo method occurred to
me when I was playing Solitaire during my illness. I noticed that it
may be much more practical to get an idea of the probability of the
successful outcome of a Solitaire game (like Canfield or some other where
the skill of the player is not important) by laying down the cards, or
experimenting with the process, and merely noticing what proportion comes
out successfully; rather than to try to compute all the combinatorial
possibilities which are an exponentially increasing number so great that,
except in very elementary cases, there is no way to estimate it. This is
intellectually surprising, and if not exactly humiliating, it gives one a
feeling of modesty about the limits of rational or traditional thinking.
In a sufficiently complicated problem, actual sampling is better than
examining all the chains of possibilities.
**

How do we understand these numbers, and how do we begin to compute the odds
when the chances of drawing cards are not the same, as is surely the case
for __Pok__é__mon__ cards given with Happy Meals at McDonalds?

Let us suppose that the __Pok__é__mon__ cards #1, #2, #3, #4, #5, #6 are generated
with probabilities p_{1}, p_{2}, p_{3},
p_{4}, p_{5}, p_{6}, respectively. Each probability
p_{i} is positive, and the total probabilities must add to 1:

It is also true that

It is remarkable, but true, that if you write out the algebraic expression for

as a sum of monomials in p_{i}, the probability that,
in N draws, you get card #1 n_{1} times; card #2 n_{2} times;
card #3 n_{3} times; card #4 n_{4} times; card #5
n_{5} times; and card #6 n_{6} times, respectively, is
given by the monomial term occurring in the above expression; ie

where

n

in that polynomial, G

When the dust clears, we obtain the following expression for the probability of having a complete set after N trials:

- [p

+ [p

- [p

+ [p

-[p

In this expression the first line is complete; in the second line
there are 6 terms that involve any five of the p_{i}; in the
third line there are 10 terms that involve any four of the p_{i};
in the fourth line there are 15 terms that involve any three of the
p_{i}; in the fifth line there are 10 terms that involve any two
of the p_{i}; and in the sixth line there are 6 terms involving
each of the p_{i}. That is the answer, no matter what N is and
what the individual p_{i} happen to be.

We can simplify the answer considerably for the case in which the 6 cards
are equally probable, by setting p_{i} = 1/6. We obtain
that, for any N,

I have written an __EXCEL__ program to determine these numbers and to
draw a graph, which is shown below: The numbers agree with those obtained
in the Monte Carlo Simulation, to about four significant figures. Here
is a listing of the numbers for small values of N:

N | P(N) | N | P(N) | N | P(N) | N | P(N) | |||

1 | .0000 | 11 | .3562 | 21 | .8726 | 31 | .9790 | |||

2 | .0000 | 12 | .4378 | 22 | .8933 | 32 | .9825 | |||

3 | .0000 | 13 | .5139 | 23 | .9108 | 33 | .9854 | |||

4 | .0000 | 14 | .5828 | 24 | .9254 | 34 | .9878 | |||

5 | .0000 | 11 | .6442 | 25 | .9388 | 35 | .9899 | |||

6 | .0154 | 16 | .6980 | 26 | .9480 | 36 | .9915 | |||

7 | .0540 | 17 | .7446 | 27 | .9566 | 37 | .9930 | |||

8 | .1140 | 18 | .7847 | 28 | .9638 | 38 | .9941 | |||

9 | .1890 | 19 | .8189 | 29 | .9698 | 39 | .9950 | |||

10 | .2718 | 20 | .8480 | 30 | .9748 | 40 | .9959 |