1/25/2013
Minute Card
- What is extrapolation?
- Sketch a "typical" residual plot.
- Sketch a residual plot where the variability about the regression line decreases as the explanatory variable increases.
- Provide feedback on the course and instructor.
Connections between correlation and regression
- r-squared, the square of the correlation coefficient, represents the fraction of variation in the values of Y that is explained by using least squares regression with the explanatory variable X.
- The slope of the least squares regression line can be written as b = r * (sy/sx).
In Class Activities
- Regression Effect and Regression Fallacy
- Kenyon Tuition Data (p:\data\math\stats\tuition.mtw)
- Anscombe's Data (p:\data\math\stats\Anscombe.mtw)
Class Exercises
- 5.69, 5.76, 5.77 - check the Minitab calculations by computing the regression equation "by hand" for at least one exercise.
- The federal government maintains a large facility in Hanford WA at which
various activities related to nuclear reactors and bombs are carried out.
The facility occupies land adjacent to the Columbia River. Over the years
there has been leaks from open pit storage of radioactive wastes into the
Columbia River. Data concerning cancer downstream from Hanford in Oregon
has been collected from nine counties in Oregon, and can be compared with
data on radioactivity levels. Some of the data from a study conducted in
1959-64 is contained in the minitab worksheet p:\data\math\stats\hanford.mtw.
The index of exposure (C1) was based on several factors, including distance
from Hanford, and average distance of the population from water frontage
on the Columbia. Column C2 gives the Cancer Mortality per 100,000 person-years
and column C3 gives the name of the county.
a. Make a scatterplot of the data. Which variable is the explanatory
variable?
b. Is the association between the variables positive or negative?
c. Find the least squares regression line for predicting cancer deaths
from the index of exposure. For each of the exposure indexes, compute the
predicted value of cancer mortality and the associated residual.
d. What percentage of the variation in cancer deaths is explained by
using the index of exposure?
e. Interpret the value of the slope in the least squares line. i.e.,
explain what this slope says about the change in cancer death rates for
different exposure indexes.
f. Plot the residuals versus the index of exposure. What does the plot
indicate about the adequacy of the linear fit?
g. Make another scatterplot of the data and include the least squares
line on the plot.
h. Suppose you lived in a county with radioactive contamination index
of exposure equal to 5. Use the least squares line to predict the cancer
mortality in your home county.
i. Compute the correlation coefficient r between index of exposure
and cancer mortality.
j. Create two new variables x* = 10x and y*=y/10.
This can be done easily by using Calc > Calculator to create:
c4=10*c1
c5=c2/10
k. Make a scatterplot of the transformed indexes and mortality rates. Does
this plot have the same appearance as the plot you constructed in part
a?
l. Is the correlation coefficient for the transformed values the same
as the correlation coefficient for the original values?
m. Does the slope of the least squares line of y* on x* have
the same slope as the regression line of y on x?
Please read Section 5.6 and Chapter 2 for class on Monday.