9/14/2022
Problem session for Activities 5 and 6 will be on Wednesday
- Note that for Activity 6 Master has been replaced with People in the Lahman package.
- Volunteers will drop their solutions into !Student R Code - Solutions for Activities and Exercises on our Google Drive folder. Please title them Acitivity5-yourname.pdf or Activity6-yourname.R.
Chapter 6 - Tidy data
- Gapminder was created by Hans Rosling in 2005
- Use the googlesheets4 package to show prevalence of HIV and other indicators - see TidyData.R in our Google Drive folder
- Short (wide) form versus long (narrow) form of data
- Reading different types of data into R
- Tidy data
- The rows, called cases of observations, each refer to a specific, unique, and similar sort of thing.
- The columns, called variables, each have the same sort of value recorded for each row.
- Tidy form may not be attractive or appealing, but it is much more useful to an analyst.
- Spreadsheets are often NOT in tidy form because they often contain data and summaries/graphs/tables/models/etc.
- Suggestions
- Keep your data and analysis separate
- Be able to uniquely identify your cases
- Codebooks may be needed to understand cases and variables
- Reshaping data
- Two different formats - wide and narrow
- It is easy to read both formats into R
- pivot_longer() and pivot_wider() allow you to move from one format to another
Activity 7
- Step through the R code associated with List-columns in Section 6.2.4 with a peer and make sure that you understand how nest(), pull(), pluck(), and unnest() work.
Please read Section 6.3 and start reading Section 6.4 for class on Friday. We will have a problem session for Chapter 4 and 5 exercises on Friday.