# 4/19/2023

## Minute Card

• What is extrapolation?
• Sketch a "typical" residual plot.
• Sketch a residual plot where the variability about the regression line decreases as the explanatory variable increases.

## Connections between correlation and regression

• r-squared, the square of the correlation coefficient, represents the fraction of variation in the values of Y that is explained by using least squares regression with the explanatory variable X.
• The slope of the least squares regression line can be written as b = r * (sy/sx).

## In Class Activities

• Regression Effect and Regression Fallacy
• Kenyon Tuition Data (Google Drive\!Class-csv-Rscripts\tuition.csv)
• Anscombe's Data (Google Drive\!Class-csv-Rscripts\Anscombe.csv) - Fit separate regression models for each of the four pairs of x and y variables. Check the R calculations by computing the regression equation "by hand" for at least one pair.

## Class Exercises

• The federal government maintains a large facility in Hanford WA at which various activities related to nuclear reactors and bombs are carried out. The facility occupies land adjacent to the Columbia River. Over the years there has been leaks from open pit storage of radioactive wastes into the Columbia River. Data concerning cancer downstream from Hanford in Oregon has been collected from nine counties in Oregon, and can be compared with data on radioactivity levels. Some of the data from a study conducted in 1959-64 is contained in the csv file Google Drive\!Class-csv-Rscripts\hanford.csv. The index of exposure (C1) was based on several factors, including distance from Hanford, and average distance of the population from water frontage on the Columbia. Column C2 gives the Cancer Mortality per 100,000 person-years and column C3 gives the name of the county.

a. Make a scatterplot of the data. Which variable is the explanatory variable?
b. Is the association between the variables positive or negative?
c. Find the least squares regression line for predicting cancer deaths from the index of exposure. For each of the exposure indexes, compute the predicted value of cancer mortality and the associated residual.
d. What percentage of the variation in cancer deaths is explained by using the index of exposure?
e. Interpret the value of the slope in the least squares line. i.e., explain what this slope says about the change in cancer death rates for different exposure indexes.
f. Plot the residuals versus the index of exposure. What does the plot indicate about the adequacy of the linear fit?
g. Make another scatterplot of the data and include the least squares line on the plot.
h. Suppose you lived in a county with radioactive contamination index of exposure equal to 5. Use the least squares line to predict the cancer mortality in your home county.
i. Compute the correlation coefficient r between index of exposure and cancer mortality.
j. Create two new variables x* = 10x and y*=y/10.

k. Make a scatterplot of the transformed indexes and mortality rates. Does this plot have the same appearance as the plot you constructed in part a?
l. Is the correlation coefficient for the transformed values the same as the correlation coefficient for the original values?
m. Does the slope of the least squares line of y* on x* have the same slope as the regression line of y on x?

• 8.5, 8.9, 8.11, 8.15, 8.21, 8.23, 8.25