Sports Analytics
Stat 291
Brad Hartlaub
Spring 2018
R links
Weekly Agendas
- January 16
- January 23
- January 30
- February 6
- February 13 - work with a partner on time series models
- February 20
- February 27 - Midterm project updates (approximately 10 min. per person)
- March 20
- March 27
- Q&A Session with Dennis Lock, Director of Analytics, Miami Dolphins (Meet in Bailey House Conference Room)
- Relationship between height and weight of NFL players over time
- Presentation of Model and Analysis for Diamond Dollars Case Competion - Jack and Grant
- Additional models for spin rate, launch angle, exit velocity, and numerous other statistical measures in MLB.
- April 3
- April 10
- April 17
- April 24 - Mock Draft
- May 1
- Oral reports on small group projects
- Progress report on final projects
- May 9 - Final Projects are due by 11:30 am
Weekly Assignments
You may use R scripts or R notebooks for your analysis, but your position paper should be in the form of a one page executive summary. You should appropriately cite any sources that you use for data or to support your rationale. All articles and position papers should start with your name and then the title. For example, hartlaub-CTE.pdf or hartlaub-PP1.pdf.
- #1 Due on Tuesday, January 23
- In a tweet President Trump claimed that "NFL attendance and ratings are WAY DOWN." For your first position paper, collect and analyze data to address the claim by others that the decreases are simply due to chance. Is the decrease simply chance variation or a significant decrease?
- Post at least one article on CTE, concussions, or head injuries in sports for all of us to read for class next Tuesday. Your pdf version of the article should be copied into a folder on Google drive.
- Sunday Swoon?
- #2 Due on Tuesday, January 30
- Who is the best? Trying to decide the best team or athlete leads to interesting debates and often controversies. However, there are interesting methods for comparing teams and players from different time periods. For example, FiveThirtyEight published an article titled Vegas Has The Best Expansion Team in the History of Pro Sports, and It's Not Close that clearly illustrated how z-scores can be helpful. Make a data driven decision regarding the best swimmer: option 1 is Janet Evans versus Katie Ledecky and option 2 is Michael Phelps versus Mark Spitz.
- Post at least one article into the Google drive folder Elo on how the Elo rating system has been used to rate players or teams or leagues. Your pdf version of the article should be posted before the end of the day on Sunday, January 28.
- #3 Due on Tuesday, February 6
- Has the gender gap narrowed over the last two decades? This week we will dive into the controversial area of gender differences in performance and pay. As the Olympcis are about to begin, estimate the improvement in times for male versus female skaters, skiers, bobsledders, etc? Use your model to predict the winning time for at least one event in the 2018 Winter Olympics. Is the rate of improvement the same for men and women? Expanding beyond Olympic competion, are there sports where you believe that women will outperform men? Collect appropropriate data to make data driven decisions for sports of your choice. Provide your R code and 1 page executive summary in the folder !Position Paper#3.
- Post at least one article into the Google drive folder on wage, performance, or rating differences for athletes or coaches. Your pdf version of the article should be posted before the end of the day on Sunday, February 4 so that we can all read the articles before class.
- What if men and women skied against each other in the Olympics?
- #4 Due on Tuesday, February 13
- This week we begin looking at performance over time. In particular, I would like you to see if you can find evidence to support or refute the claim that player and team performance tends to follow a quadratic trend over time. For example, some experts believe that hitters will improve until they hit peak performance (hopefully near the end of a long career) and then gradually fall back to where they started. When considering particular players, you need to make sure that the career lengths are long enough to provide reasonable estimates of a career profile. Are the quadratic models better in some sports than others? Your analysis should include player profiles from at least two different sports. Post your R code and a 1 page executive summary to !Position Papers>Positon Paper #4.
- #5 Due on Tuesday, February 20
- Can ARIMA time series model be used to explain the variability in player or team performance from game-to-game and to forecast performance? For example, look at the points scored by your favorite NBA player over time. Create and comment on ACF plots. Does an autoregressive model provide a good fit? Does a moving average model provide a good fit? Is the player's performance stationary over time or does it drift over time? Now consider other performance statistics and other players, perhaps in different sports. Find a player and performance statistic where an autoregressive model provides a good fit and use the model to forecast performance. Find a different player and performance statistics where a moving average model provides a good fit and use your model to make forecasts. Ideally you will pick settings where you can "check your model" by seeing if the actual value falls into your prediction interval. (For this assignment only, you may switch over to Minitab if you wish. I'm curious to hear your toughts about comparing and contrasting Minitab and R for time series.)
- #6 Due on Tuesday, March 27
- Are data from the NFL Combine useful predictors of success in the NFL? See NFL Combine Results for some useful data. Post your R code and a short executive summary to !Position Papers>Positon Paper #6. You may work with one or two partners on this analysis and summary.
- Open Ended Project due on Friday, April 27- This 15 to 20 page paper will address an open question in a sport of interest to you. You will collect and analyze appropriate data using statistical methods and models. Your report must use statistical simulation to address at least one aspect of the open problem.
Data Sources
Interesting Articles
Interesting Links