Elements of Statistics (Math 106) - Exam 1
Fall 2001 - Brad Hartlaub

Directions: Please answer all of the questions below. The point values for each problem are indicated in parentheses. Partial credit will be awarded if you show your work.

1. The ages and salaries (in thousands of dollars) for the CEOs of America's best small companies in 1993 are listed in the file p:\data\math\stats\ceodata.mtw.

a. Identify and compute two measures of center for the age distribution. (5)

b. Identify and compute the measure of spread that is most appropriate for the salary distribution. Explain your rationale for choosing this measure of spread. (5)

c. Do you think normal models are appropriate for the age and salary data? Explain your responses by referring to the shape of the normal probability plots. (10)

d. Identify the linear transformation that can be used to standardize the ages so they have a mean of zero and a standard deviation of one. (5)

2. A 1996 paper from Civil Engineering describes the compressive strength of concrete for freshwater exhibition tanks as having a mean of 6,000 psi and a standard deviation of 240 psi. Assuming the compressive strength is normally distributed,

a. What is the chance that the compressive strength of a sample of concrete is below 6,340 psi? (5)

b. What is the chance that the compressive strength of a sample of concrete is between 5,000 and 5,900 psi? (5)

c. What is the chance that the compressive strength of a sample of concrete is above 5,800 psi? (5)

d. Above what value will 70 percent of the compressive strengths of samples of concrete be? (5)

e. Below what value will 85 percent of the compressive strengths of samples of concrete be? (5)

3. An experiment results in observations on a group of subjects for three variables, an explanatory variable (X), a response variable (Y), and gender (male or female). Is it possible for X and Y to be positively associated for both males and females, but when gender is ignored, the overall association between X and Y is negative? If so, sketch a possible scatterplot to illustrate this situation. If not, explain why not. (15)

4. The Per Capita Gross Domestic Product (GDP) and the Per Capita Health Care Spending (HCS) for 22 countries is provided in p:\data\math\stats\health.mtw.

a. Is there any association between HCS and GDP for these countries? (5)

b. Find and interpret the value of the correlation coefficient (r) for HCS and GDP. (5)

c. What is the equation of the least squares regression line for predicting HCS using GDP? (5)

d. Interpret the value of the slope parameter in the least squares regression line. (5)

e. The country of Auschtabeckwinstille has a GDP of $12000, but no data is available for the value of its HCS. What would you predict as the value of Auschtabeckwinstille's HCS? Show how you arrived at this value. (3)

f. What is the value of the residual for the country with GDP of 11500 and HCS of 1050? (5)

g. Plot the residuals against GDP. Comment on the appearance of the plot and any implications this may have to the adequacy of the linear model. (5)

5. The winning time in the Olympic men's 500-meter speed skating race over the years 1924 to 1992 can be described by the regression equation: Winning Time = 255 - 0.1094*(Year).

a. Is the correlation between winning time and year positive or negative? Explain. (5)

b. Would you be willing to use this regression equation to predict the winning time in the 2050 Winter Olympics? Explain. (5)

6. The faculty senate at a large university wanted to know what proportion of the students thought a foreign language should be required for everyone. The statistics department offered to cooperate in conducting a survey, and a simple random sample of 500 students was selected from all students enrolled in a statistics classes. A survey form was sent by e-mail to these 500 students. Discuss the extent to which each of the three types of bias would be likely to occur in this survey:

Noncoverage Bias (5)

Nonresponse Bias (5)

Response Bias (5)

7. A study published in the Journal of the American Medical Association described a sample of 46,355 postmenopausal women who were studied for 15 years. These women were asked about their use of hormone therapy and whether or not they had breast cancer; 2082 women developed the disease. One striking result was that "for each year of combined [estrogen and progestin] therapy, a woman's risk of breast cancer was found to increase by 8% compared to a 1% increase in women taking only estrogen."

a. Do you think this research was an observational study or an experiment? Explain. (5)

b. What are the explanatory and response variables for this study? Are they quantitative or categorical? (10)