THE BINOMIAL DISTRIBUTION AND
ITS APPROXIMATIONS
Write up detailed answers to the following questions. Show all of your mathematical work, and where applicable, provide your Maple code with your report.
We begin by looking at the probability distribution function (pdf) for the Binomial( n, p ) distribution for various choices of n and p . Recall the form of the pdf is =
Let's look at the probability histogram for the Binomial(4, 0.5) distribution.
> restart; with(plots, display):
> pdf := BinomialPDF(4, 0.5, x);
> ProbHist(pdf, 0..4);
>
1. Change the Maple code above where appropriate and look at the probability histograms for the Binomial( n , 0.5) distributions where n =10, 15, 20, 50. Comment on the appearance of these histograms as n gets larger. In particular, does a normal model seem to be a good approximation to the shape of the histograms?
2. The following Maple code will create an animation showing the probability histograms for the Binomial( n, p ) distributions for n = 4..50 and your choice of p . Superimposed on each probability histogram is an approximating normal density curve. For a fixed p , comment on the appearance of these histograms as n gets larger. In particular, does a normal model seem to be a good approximation to the shape of the histograms? What effect does varying
p (0 < p < 0.5) have on these animations?
> p:=0.1;
> for n from 4 to 50 do
> H[n]:=ProbHist(BinomialPDF(n,p,x),-0.5..50.5, 51):
> N[n]:=plot(NormalPDF(n*p,n*p*(1-p),x),x = -0.5..50.5):
> P[n]:=display( {N[n],H[n]} ):
> od:
> display([seq(P[n], n=4..50)], insequence=true);
3. The following Maple code will create an animation showing the probability histograms for the Binomial( n, p ) distributions for p = 0.01..0.5 and your choice of n . Superimposed on each probability histogram is an approximating normal
density curve. Comment on the appearance of these histograms as p varies. In particular, does a normal model seem to be a good approximation to the shape of the histograms?
> n:=40;
> for i from 1 to 50 do
> prob[i]:=i/100;
> Hp[i]:=ProbHist(BinomialPDF(n,prob[i],x),-0.5..40.5, 41):
> Np[i]:=plot(NormalPDF(n*prob[i],n*prob[i]*(1-prob[i]),x),x = -0.5..40.5):
> Pp[i]:=display( {Np[i],Hp[i]} ):
> od:
> display([seq(Pp[i], i=1..50)], insequence=true);
>
4. Based on your explorations in questions 1-3, write a summary explaining when (i.e. for what ranges of n and p ) the Binomial( n, p ) distribution is well approximated by a normal distribution. In particular, is the size of n the only deciding factor, or is there a dependence on p also?
>
5. Recall that the mean and variance of the Binomial( n, p ) distribution are np and np (1 -p ), respectively. Therefore, if a normal approximation is appropriate, the best approximation is the Normal( , ) distribution.
The following Maple code will plot the probability histogram for the Binomial(30, 0.5) distribution from x = 12 to 20. The density curve f( x ) for the Normal( = 15, = 7.5) distribution is overlayed on this histogram.
> n:=30;
> p:=0.5;
> Bin:=ProbHist(BinomialPDF(n,p,x),12..20,9):
> Nor:=plot(NormalPDF(n*p,n*p*(1-p),x),x=10..22):
> display({Bin,Nor});
Let f( x ) denote the probability density function for the
Normal ( , = 7.5 ) distribution. Present an argument justifying the claim that if X is a Binomial(30, 0.5) random variable, then
P(12 <= X <= 20) ~ = = -
where is the cumulative distribution function for the Normal(0,1) distribution.
6. Generalize your argument in problem 5 by showing that if X is a
Binomial( n, p ) random variable, and if and , then for relatively large n and ,
P( a <= X <= b ) ~ = = -
where f( x ) is the density curve for the Normal( , ) distribution. Hint: A picture might prove very helpful.
7. Suppose 40% of the U.S. population opposes the death penalty. In a random sample of 900 U.S. citizens, let X denote the number that oppose the death penalty. Work each of the problems below in three ways --- using the
BinomialPDF command, using the BinomialCDF command, and using the appropriate normal approximation and NormalCDF command. Compare the answers obtained by the three methods.
(a) What is the probability that X is 355 or 356?
(b) What is the probability that X is between 345 and 362, inclusive?
(c) What is the probability that X is greater than 380?
8. The following Maple code will create an animation showing the probability histograms for the Binomial( n, p ) distributions for n = 5..50 and p = 1/ n . Superimposed on each probability histogram is an approximating normal density curve. Comment on the appearance of the histograms as n and p change. In particular, does a normal model seem to be a good approximation to the shape of the histograms?
> for i from 5 to 50 do
> num[i]:=i;
> prob[i]:=1/num[i];
> Hp[i]:=ProbHist(BinomialPDF(num[i],prob[i],x),-1..11, 12):
> Np[i]:=plot(NormalPDF(num[i]*prob[i],num[i]*prob[i]*(1-prob[i]),x),x = -5..11):
> Pp[i]:=display( {Np[i],Hp[i]} ):
> od:
> display([seq(Pp[i], i=5..50)], insequence=true);
9. You probably noticed in the last example that a normal model was not a good approximation to the shapes of the binomial histograms. Let's investigate why. Recall in the previous animation the success probability was assigned a value of 1/ n , where n was the number of Bernoulli trials. Generalizing a bit further, for a fixed n suppose we take as our success probability, rather than .
a) Show that f( x ), the pdf for the distribution, can be expressed as:
b) Now verify the following limits (assuming )
And therefore, show
The probability distribution function g( x ) defined as
g( x ) = , , x = 0, 1, 2, .... , is called the
Poisson( ) distribution function.
10. Show that the Poisson( ) distribution is indeed a valid probability distribution by verifying that:
(a) for all x = 0, 1, 2, .... .
AND
(b) .
11. Let's see how well the Poisson( ) distribution approximates the
distribution. The following Maple code will plot the probability histograms for these two distributions for your choice of n and . How well does the Poisson approximate the binomial for your choice of n
and ?
>
> n:=20;
> lambda:=2;
> Bin:=ProbHist(BinomialPDF(n,lambda/n,x),0..n, n+1):
> Poi:=ProbHist(PoissonPDF(lambda,x),0..n,n+1):
> display([Bin,Poi]);
>
12. Suppose 1% of the ceramic propane light mantles produced by a process have pitted surfaces. A random sample of 200 ceramic mantles is taken from the process. Let X denote the number of ceramic mantles in the sample that have pitted surfaces. Work each of the problems below in three ways --- using the BinomialPDF command, PoissonPDF command, and the PoissonCDF command. Compare the answers obtained by the three methods.
(a) What is the probability that X is 0 or 1?
(b) What is the probability that X is between 3 and 5, inclusive?
>
13. The following Maple code will simulate N independent observations from a distribution, where N, n, and are of your choosing.
> N:=1000;
> n:=20;
> lambda:=2;
>
> Sample:= BinomialS(n, lambda/n, N):
Now that we have a sample from the distribution, let's
see how well the Poisson( ) distribution approximates. The following
Maple code will plot a histogram of the Sample data and a probability
histogram of the Poisson( ) distribution together for comparison.
> Histogram(Sample, -0.5..10.5, 11);
> pdf := PoissonPDF(lambda,x);
> ProbHist(pdf, 0..10);
> Hist := Histogram(Sample, -0.5..10.5, 11):
> Poi := ProbHist(pdf, 0..10):
> display({Poi,Hist});
For your fixed values of n and , try N = 30, 100, and 5000. Compare the data histograms and the Poisson( ) probability histograms in each case. For which value of N does the data histogram most closely resemble the Poisson( ) probability histogram? Does this agree with the interpretation of probability of events as a long run frequency? Explain.
>