Math 106 (Statistics) Daily Schedule, Fall 2019

The following is a tentative syllabus for the course. This page will be updated and kept current, so please check back periodically.

Daily Schedule for Math 106
(Last modified 29 October 2019)
Date  Reading Assignment
Topics
Textbook Practice Problems
Written Homework Due
Friday 30 August

What is Statistics?







Mon 2 September 1.1, 1.2
The Basics, Introduction to R
1.1, 3, 5, 7, 9

Wed 4 September 1.3-1.4
Data collection in observational studies and experiments
1.6, 15, 17, 19, 21, 25, 27, 29, 37
1.2, 1.10
Fri 6 September 2.1.1-2.1.4
Mean, dot plots, histograms, and standard deviation
1.16, 18, 24, 28, 35, 43; 2.2, 5, 9
1.12, 20, 38, 40
   


Mon 9 September 2.1.5-2.1.7, 2.2-2.3
Median and associated measures, box plots,
categorical data
2.4, 10, 11, 13, 15, 25, 29, 33.
1.32. Also, for both Age and TextMessages in our class data,do the following:
-- Calculate mean and standard deviation for the whole group.
-- Calculate mean and standard deviation by varsity athlete/non-athlete
-- Create histograms with two different binnings and comment on
       which you think is more useful or informative.
-- Comment on any interesting data/trends/observations.
Wed 11 September
Finding data for graph and caption-writing project.

2.12, 2.16, 2.32. Also, for both Age and TextMessages in our class data:
-- Calculate the median and IQR.
-- Create a boxplot.
-- Comment for each of them on the strengths and weakness of mean/sd
     vs median/IQR and histogram vs boxplot.
Fri 13 September 3.1
Probability Basics
3.1, 3, 5, 9, 11, 39
A citation for the data you plan to use for the graph and caption-writing
project, the variable or variables from that data set you plan to graph,
and some evidence that you've been able to load that data into R.
(Maybe a favstats() for the variable(s) you plan to use, or something similar.)
   


Mon 16 September 3.5
Continuous Probability Models
3.37
2.22, 3.6, 3.8; Also, create a two-way table for academic division by section.
Comment on any interesting trends.
Wed 18 September 4.1
The Normal distribution
4.1, 3, 5, 25, 29
3.12, 38bcd, 40; Also, just the graph (plus the R code you used to produce it)
from your writing project for next Monday.
Fri 20 September 4.2 Evaluating Normality 4.2, 8
4.4, 6, 10.
   


Mon 23 September
Climate Teach-in. Meet in Hayes as usual.

Graph and Caption-writing project due, including citation and code and
(printed separately) a nicely presented graph with caption. Bring at least
two copies of the graph and caption, one to submit and one to take out to
the teach-in.
Wed 25 September
Midterm #1: Covers Chapters 1-3



Fri 27 September 4.3 The Binomial Distribution
4.11, 13, 23
This very short exercise
 



Mon 30 September 5.1 Variability in Estimates 5.1, 3
4.12; Also, make a histogram and a QQ plot for both wingspan
and number of text messages in our class data set. Comment on any
interesting features/correspondences between the histogram and the
QQ plot.
Wed 2 October 5.2 Confidence Intervals 5.2, 7, 11, 13, 27
4.14, 5.4, 6; Problems 3bcd, 4, 5 from Wednesday's in-class worksheet
Fri 4 October 5.3-5.3.3
Hypothesis Testing 5.9, 15, 17
5.8, 12, 14; Also, calculate and interpret both a 90% and a 99% confidence
interval for the proportion of Kenyon students who are right handed,
under the assumption that the 97 people in our class constitute a
random-enough sample of Kenyon students for this particular variable.
(That is, use our class data set, and pretend like it's simple random sample.)
Calculate the 90% interval "by hand" using the methods of 175-179 without
using prop.test(); then calculate the 99% interval using prop.test.
Also, as always, include the R code that you used in arriving at your answer.
 



Mon 7 October 5.3.4-5.3.6 More on Hypothesis Testing
5.21, 23, 33
5.10, 16, 18, 34
Wed 9 October 6.1
Inference for a Single Proportion
6.1, 2, 5, 7, 13, 43, 45
5.22, 24, 30, 36; Also, run a hypothesis test on the hypothesis that 80% of
Kenyon students are right handed. (Again, ignore the biased sample issue.)
Find the p-value in two ways: Once "by hand" as outlined in Examples 5.29
and 5.30 of our textbook, then with the prop.test() command. You'll get
two different p-values. Interpret each in the context of the problem and
write just a few words about why they're different.





Mon 14 October
6.2
Inference for the Difference of Two Proportions
6.17, 19, 21, 23, 25, 39
6.8, 10, 11, 48
Wed 16 October 6.3
Goodness of Fit
6.31, 33
6.16, 20, 22, 24, 42, 44
 Fri 18 October
Midterm #2: Covers Chapters 4-5, Section 6.1







Mon 21 October 6.4
Chi-Squared Testing 6.47

Wed 23 October 6.4
Homogeneity and Independence 6.35, 6.37
6.28, 30, 32, 34, 36. Do #6.34d both by hand and using xchisq.test().
Fri 25 October
The Central Limit Theorem
6.40, 50; Also conduct a hypothesis test to see whether being
a varsity athlete is independent of academic division in our class data set.
I've added a column to our class data set for this, so you may need to redownload
the data set. The column is "DivisionCombined" in which I've combined Fine
Arts and Humanities into the designation FAH. With only 3 Fine-Arts-interested
people among our 97, it was too few to be running statistical tests on. Use this
new column in your test.





Mon 28 October 7.1
The t-distribution 7.1, 3, 11b
These problems.
Wed 30 October 7.1, 7.2
The t-distribution, Paired Data 7.7, 15, 17, 19, 21
7.4, 7.10
Fri 1 November 7.3
Difference in Means 7.16, 18bc
  • 7.12, 14
  • Use the NFL100Sample dataset to run a hypothesis test that the
    average BMI for NFL players is not 30.
    Be sure to use good prose in your writeup and go
    through each step in the hypothesis testing procedure.
  • Use the same NFL100Sample dataset to calculate a 90% confidence
    interval for the average age for all NFL players,
    and interpret this confidence interval in context.
  • Using our class data (and the totally unjustified assumption
    that the 97 of you are a representative sample of Kenyon
    students) to test the hypothesis that the average Kenyon
    student's height is different from their wingspan.





Mon 4 November 7.5-7.5.2
ANOVA 7.35

  • 7.24, 26, 49
  • Using our class data (and our assumption that y'all are a
    representative sample) to test the assumption that self-
    identified cat people and dog people at Kenyon have different
    average pulse rates. Report and interpret your p-value.
  • Find a 90% confidence interval for the difference in average
    pulse rate between Kenyon cat people and dog people and
    interpret this confidence interval in context.
Wed 6 November 7.5.3-7.5.5
ANOVA continued 7.43
Study for Friday's midterm.
Fri 8 November
Midterm #3: Covers Sections 6.2-6.4, 7.1-7.3






Mon 11 November
ANOVA continued


Wed 13 November 8.1
Correlation and Regression Intro
8.1, 2, 3, 7, 9, 37
  • 7.39c, 7.40
  • Of these three possible ANOVA tests: (1) Height by class year,
    (2) Exercise minutes by (combined) division, and (3) Text messages
    by class year, one fails the constant variance condition, one meets
    constant variance but fails normal residuals, and one meets both criteria.
    Which is which? (Note: One person failed to report usable data for
    exercise minutes; favstats() ignores this automatically. If using sd()
    instead, include the option na.rm=TRUE)
  • Also, for the one of the above that meets both technical criteria for
    ANOVA, conduct a full ANOVA hypothesis test, reporting and
    interpreting your p-value.

Fri 15 November 8.2
Least Squares Regression Line 8.17, 19, 22, 25
8.4, 6, 10, 38; Also, conduct an ANOVA test to see if there is a difference in
wingspan across the (combined) divisions, reporting and interpreting your
p-value in context. Conduct a Tukey HSD test if it is appropriate.





Mon 18 November 8.4
Inference for Regression
8.18, 20, 26; Also, load the data set sparrows.csv from the P drive. It
gives the weight (in grams) and wing length (in mm) for sparrows
from nests that received various treatments. (Thanks to Prof. Bob Mauck
for the data.) Do the following:
  • Make a scatter plot of weight as a function of wing length,
    including the least-squares regression line.
  • Find the equation for the regression line, and interpret what
    the slope and intercept mean in context. (And say whether
    the intercept is meaningful for this data.)
  • Make a residual-vs-wing-length plot as well as a QQ plot
    of the residuals and comment on what these tell us about
    how appropriate a linear regression model was in this situation.
  • Find both R (the correlation) and R^2 and explain what they
    tell us in this situation.
Wed 20 November
8.4 Inference for Regression 8.31, 33
Load the data set pines.csv from the P drive.
The variables Hgt90, Hgt96, and Hgt97 represent the heights of pine
trees in the BFEC pine grove at the time of planting (1990) and then
in 1996 and 1997 respectively. Do the following:
  • Construct scatter plots with regression lines to examine the
    relationship between the heights of trees in 1997 vs 1990
    and 1997 vs 1996. Comment on any relationships you see.
  • Find the least-squares regression lines for predicting heights
    in 1997 from the height in 1990 and from the height in 1996.
  • Interpret the meanings of the slope and intercept in these lines.
  • Answer the following: Are you satisfied with the linear fit in
    each situation? Why or why not? Are there any other statistics
    about the relationship between the variables in these two
    situations that you might want to use to help explain why one
    fit might be better than the other? Are there real-world factors
    to help explain the differences you see?
Fri 22 November 8.3
Outliers and Transformations 8.27
8.32, 39; also, revisit the regression you did on Monday with
Prof. Mauck's sparrow data. Find and interpret a 98% confidence interval
for the slope of that regression line.





Mon 2 December Supplement Confidence and Prediction Intervals

Wed 4 December 9.1
Introduction to Multiple Regression
8.28, 30; Also, load the data set MetabolicRate.csv from the P
drive. The variables BodySize and Mrate are the weight (in grams)
and metabolic rate for Manduca Sexta caterpillars from Prof.
Itagaki's lab. Also included are LogBodySize and LogMrate, the
logarithms of these two variables. Our goal is to build a linear model
to predict metabolic rate (either directly or on a log scale) using
body size (either directly or on a log scale). There are four possible
combinations of Mrate/LogMrate (vertical) versus
BodySize/LogBodySize (horizontal). Make all four scatterplots,
including regression lines, and comment on which you think is best.
Then, find the equation for the regression line for that combination
of variables.
Fri 6 December
Midterm #4: Covers Sections 7.5, 8.1, 8.2, 8.4.







Mon 9 December 9.2 Choosing Predictors
Use Prof. Mauck's sparrow data yet again to find and interpret 90%
confidence and prediction intervals for the weight of a sparrow with
a 30 mm wingspan.

Also, use the data set neckwaist.csv to find and interpret 98%
confidence and prediction intervals for the waist size (in inches) for
someone with a 15.25 inch neck size.
Wed 11 December
Power

Paper #2 Due!
Fri 13 December Chapters 1-8
Q&A Review Session

None. Enjoy the day off! (I'm sure you have nothing else due today.)