9/26/2022
Chapter 7 - Iteration - see Iteration.R
- Our goal is to automate iterative operations
- For example, which player had the most hits for each team in MLB?
- Which pitcher had the most strike outs for each team in MLB?
- Which student has the highest GPA for each team/club at Kenyon?
- Vectorized operations
- Many people use for () loops to iterate calculations.
- However, R is highly optimized for vector operations, and loops do not take advantage of this optimization.
- Try to avoid writing for() loops whenever possible with R.
- Many functions, e.g., exp(), will be applied to every element of a vector by default.
- Note that summary functions, like mean(), return only a single value.
- Different functions behave differently so take advantage of the function when it is vectorized!
- Using across() with dplyr functions can be extremely helpful
- The function across() applies operations programmatically.
- Using across() helps us avoid "magic numbers"/references that are often used in loops.
- The across() function provides an easy way to perform an operation on a set of variables without having to type or copy-and-paste the name of each variable.
- The map() family of functions
- Use map() to apply a function to each item in a list or vector or the columns of a data frame.
- map() is the main function from the purrr package.
- The map() function will always return a list.
- The map_dbl() function forces the output to be a vector of type double.
- The map_int() funtion forces the output to be a vector of integers.
- Iteration over a one dimensional vector
- Iterating a known function
- Example - the map_int() function can be used with nchar() to find the length of all names for the Angels in MLB - see Iterate.R
- The nchar() function can also be used directly because it is vectorized
- Iterating an arbitrary function
- You can apply any function, including user defined functions that you create yourself.
- Example - Top-5 seasons for MLB teams - see Iterate.R
- map_dfr() provides another way to collect the results into a data frame.
- Iteration over subgroups
- The group_modify() function in purrr allows you to apply an arbitrary function that returns a data frame to the groups of a data frame.
- You can use the group_by() function to define a grouping.
- Simulation
- Using distributions to understand randomness
- Preliminary comments about Monte Carlo simulation studies
- Bootstrap sample
- Sampling with replacement - see Chapter 9
- Extended example - looking at possible predictors of BMI (body mass index)
Activity 9
- Replicate the analyses in Examples 7.5.1 (Expected winning percentage) and 7.5.2 (Annual leaders). Explain with your partner what the nls() function does.
- Modify the R code provided in Example 7.7 to examine the relationship between possible predictor variables and BMI. That is, see if you can create plots like those in Figure 7.6 on page 156 to explore relationships between BMI and those possible predictors for participants in the NHANES study.
Please read Chapter 8 for class on Wednesday.