9/3/2025
Introduce Kate Bogan and Long Tran - your MSSC tutors for Data Analysis
Chapter 0
- Introduction to a randomization/permutation test
- Open Day1-F2025.csv (data from the first day of a data analysis course) The variables are:
- Restingpulse
- Activepulse - pulse after 1 minute of exercise
- Varsity - yes=varsity athlete, no=not a varisty athlete
- DistanceHome - distance traveled from home to campus
- Is the active pulse different for athletes and nonathletes?
- To answer this question we will consider a completely different type of inference, based on randomization.
- Simple demonstration of the idea with cards
- Open StatKey and click Test for Difference in Means
- Upload the file Day1-F2025.csv
- Generate 1 sample
- Generate 1000 samples
- Make Inference
- Compare inference based on this simulation technique with the inference from t.test (see Day1.R)
Continue Introduction to R
- Importing scripts - open Day1.R
- Subsetting in R - try the subset command
- athlete<-subset(mydata, Varsity=='yes')
- Try to access and save a file to your HW folder GoogleDrive\:Stat206-DataAnalysis-F2025\yourname
Suggested Class Exercises
- Compare the resting pulses of athletes and nonathletes
- Compare the active pulses of athletes and nonathletes
- Compare the distance traveled to Gambier for athletes and nonathletes
- Create a new varaiable, say Diff, to measure the increase in pulse after one minute of exercise. Do the heart rates for nonathletes increase more than the heart rates for athletes after one minute of exercise?
- Exercise 0.26 on p. 17
- Exercise 0.23 with a randomization/permutation test
Chapter 1
- Open Stat206-DataAnalysis-F2025\2ePowerPoint\Sec1.1R.pptx on our Google Drive folder
- Review simple linear regression
- Discuss and illustrate R Markdown file for Chapter 1 - see Stat206-DataAnalysis-F2025\2eMarkdown\Sec1.1.RMD
Please read Sections 1.2 and 1.3 for class on Monday. We will have our first problem session on Friday for the suggested class exercises above.