THE BINOMIAL DISTRIBUTION AND

ITS APPROXIMATIONS

Write up detailed answers to the following questions. Show all of your mathematical work, and where applicable, provide your Maple code with your report.

We begin by looking at the probability distribution function (pdf) for the Binomial( n, p ) distribution for various choices of n and p . Recall the form of the pdf is [Maple Math] = [Maple Math] [Maple Math] [Maple Math]

Let's look at the probability histogram for the Binomial(4, 0.5) distribution.

> restart; with(plots, display):

> pdf := BinomialPDF(4, 0.5, x);

[Maple Math]

> ProbHist(pdf, 0..4);

[Maple Plot]

>

1. Change the Maple code above where appropriate and look at the probability histograms for the Binomial( n , 0.5) distributions where n =10, 15, 20, 50. Comment on the appearance of these histograms as n gets larger. In particular, does a normal model seem to be a good approximation to the shape of the histograms?

2. The following Maple code will create an animation showing the probability histograms for the Binomial( n, p ) distributions for n = 4..50 and your choice of p . Superimposed on each probability histogram is an approximating normal density curve. For a fixed p , comment on the appearance of these histograms as n gets larger. In particular, does a normal model seem to be a good approximation to the shape of the histograms? What effect does varying

p (0 < p < 0.5) have on these animations?

> p:=0.1;

[Maple Math]

> for n from 4 to 50 do

> H[n]:=ProbHist(BinomialPDF(n,p,x),-0.5..50.5, 51):

> N[n]:=plot(NormalPDF(n*p,n*p*(1-p),x),x = -0.5..50.5):

> P[n]:=display( {N[n],H[n]} ):

> od:

> display([seq(P[n], n=4..50)], insequence=true);

[Maple Plot]

3. The following Maple code will create an animation showing the probability histograms for the Binomial( n, p ) distributions for p = 0.01..0.5 and your choice of n . Superimposed on each probability histogram is an approximating normal

density curve. Comment on the appearance of these histograms as p varies. In particular, does a normal model seem to be a good approximation to the shape of the histograms?

> n:=40;

[Maple Math]

> for i from 1 to 50 do

> prob[i]:=i/100;

> Hp[i]:=ProbHist(BinomialPDF(n,prob[i],x),-0.5..40.5, 41):

> Np[i]:=plot(NormalPDF(n*prob[i],n*prob[i]*(1-prob[i]),x),x = -0.5..40.5):

> Pp[i]:=display( {Np[i],Hp[i]} ):

> od:

> display([seq(Pp[i], i=1..50)], insequence=true);

[Maple Plot]

>

4. Based on your explorations in questions 1-3, write a summary explaining when (i.e. for what ranges of n and p ) the Binomial( n, p ) distribution is well approximated by a normal distribution. In particular, is the size of n the only deciding factor, or is there a dependence on p also?

>

5. Recall that the mean and variance of the Binomial( n, p ) distribution are np and np (1 -p ), respectively. Therefore, if a normal approximation is appropriate, the best approximation is the Normal( [Maple Math] , [Maple Math] ) distribution.

The following Maple code will plot the probability histogram for the Binomial(30, 0.5) distribution from x = 12 to 20. The density curve f( x ) for the Normal( [Maple Math] = 15, [Maple Math] = 7.5) distribution is overlayed on this histogram.

> n:=30;

[Maple Math]

> p:=0.5;

[Maple Math]

> Bin:=ProbHist(BinomialPDF(n,p,x),12..20,9):

> Nor:=plot(NormalPDF(n*p,n*p*(1-p),x),x=10..22):

> display({Bin,Nor});

[Maple Plot]

Let f( x ) denote the probability density function for the

Normal ( [Maple Math] , [Maple Math] = 7.5 ) distribution. Present an argument justifying the claim that if X is a Binomial(30, 0.5) random variable, then

P(12 <= X <= 20) ~ = [Maple Math] = [Maple Math] - [Maple Math]

where [Maple Math] is the cumulative distribution function for the Normal(0,1) distribution.

6. Generalize your argument in problem 5 by showing that if X is a

Binomial( n, p ) random variable, and if [Maple Math] and [Maple Math] , then for relatively large n and [Maple Math] ,

P( a <= X <= b ) ~ = [Maple Math] = [Maple Math] - [Maple Math]

where f( x ) is the density curve for the Normal( [Maple Math] , [Maple Math] ) distribution. Hint: A picture might prove very helpful.

7. Suppose 40% of the U.S. population opposes the death penalty. In a random sample of 900 U.S. citizens, let X denote the number that oppose the death penalty. Work each of the problems below in three ways --- using the

BinomialPDF command, using the BinomialCDF command, and using the appropriate normal approximation and NormalCDF command. Compare the answers obtained by the three methods.

(a) What is the probability that X is 355 or 356?

(b) What is the probability that X is between 345 and 362, inclusive?

(c) What is the probability that X is greater than 380?

8. The following Maple code will create an animation showing the probability histograms for the Binomial( n, p ) distributions for n = 5..50 and p = 1/ n . Superimposed on each probability histogram is an approximating normal density curve. Comment on the appearance of the histograms as n and p change. In particular, does a normal model seem to be a good approximation to the shape of the histograms?

> for i from 5 to 50 do

> num[i]:=i;

> prob[i]:=1/num[i];

> Hp[i]:=ProbHist(BinomialPDF(num[i],prob[i],x),-1..11, 12):

> Np[i]:=plot(NormalPDF(num[i]*prob[i],num[i]*prob[i]*(1-prob[i]),x),x = -5..11):

> Pp[i]:=display( {Np[i],Hp[i]} ):

> od:

> display([seq(Pp[i], i=5..50)], insequence=true);

[Maple Plot]

9. You probably noticed in the last example that a normal model was not a good approximation to the shapes of the binomial histograms. Let's investigate why. Recall in the previous animation the success probability was assigned a value of 1/ n , where n was the number of Bernoulli trials. Generalizing a bit further, for a fixed n suppose we take [Maple Math] as our success probability, rather than [Maple Math] .

a) Show that f( x ), the pdf for the [Maple Math] distribution, can be expressed as:

[Maple Math]

b) Now verify the following limits (assuming [Maple Math] )

[Maple Math]

[Maple Math]

And therefore, show

[Maple Math]

The probability distribution function g( x ) defined as

g( x ) = [Maple Math] , [Maple Math] , x = 0, 1, 2, .... , is called the

Poisson( [Maple Math] ) distribution function.

10. Show that the Poisson( [Maple Math] ) distribution is indeed a valid probability distribution by verifying that:

(a) [Maple Math] for all x = 0, 1, 2, .... .

AND

(b) [Maple Math] .

11. Let's see how well the Poisson( [Maple Math] ) distribution approximates the

[Maple Math] distribution. The following Maple code will plot the probability histograms for these two distributions for your choice of n and [Maple Math] . How well does the Poisson approximate the binomial for your choice of n

and [Maple Math] ?

>

> n:=20;

> lambda:=2;

[Maple Math]

[Maple Math]

> Bin:=ProbHist(BinomialPDF(n,lambda/n,x),0..n, n+1):

> Poi:=ProbHist(PoissonPDF(lambda,x),0..n,n+1):

> display([Bin,Poi]);

[Maple Plot]

>

12. Suppose 1% of the ceramic propane light mantles produced by a process have pitted surfaces. A random sample of 200 ceramic mantles is taken from the process. Let X denote the number of ceramic mantles in the sample that have pitted surfaces. Work each of the problems below in three ways --- using the BinomialPDF command, PoissonPDF command, and the PoissonCDF command. Compare the answers obtained by the three methods.

(a) What is the probability that X is 0 or 1?

(b) What is the probability that X is between 3 and 5, inclusive?

>

13. The following Maple code will simulate N independent observations from a [Maple Math] distribution, where N, n, and [Maple Math] are of your choosing.

> N:=1000;

> n:=20;

> lambda:=2;

>

> Sample:= BinomialS(n, lambda/n, N):

[Maple Math]

[Maple Math]

[Maple Math]

Now that we have a sample from the [Maple Math] distribution, let's

see how well the Poisson( [Maple Math] ) distribution approximates. The following

Maple code will plot a histogram of the Sample data and a probability

histogram of the Poisson( [Maple Math] ) distribution together for comparison.

> Histogram(Sample, -0.5..10.5, 11);

[Maple Plot]

> pdf := PoissonPDF(lambda,x);

[Maple Math]

> ProbHist(pdf, 0..10);

[Maple Plot]

> Hist := Histogram(Sample, -0.5..10.5, 11):

> Poi := ProbHist(pdf, 0..10):

> display({Poi,Hist});

[Maple Plot]

For your fixed values of n and [Maple Math] , try N = 30, 100, and 5000. Compare the data histograms and the Poisson( [Maple Math] ) probability histograms in each case. For which value of N does the data histogram most closely resemble the Poisson( [Maple Math] ) probability histogram? Does this agree with the interpretation of probability of events as a long run frequency? Explain.

>