9/19/2022
Chapter 6 - Tidy data
- Importing data with the path - see TidyData-path.R
- List-columns
- Variables in data frames that have type list are called list-columns
- Working with list-columns requires your careful attention
- Example with gender-neutral names
- Starting with one name - e.g. "A Boy Named Sue"
- A more thorough analysis with all baby names
- Naming conventions with objects
- A name cannot start with a number
- A name cannot contain punctuation, other than . or _
- Case (R or r) matters in names - e.g., NCHS, nchs, Nchs, nChs, etc. are all different to R
- Consider using tidyverse style guide, but recognize that deviations are inevitable when using code from lots of different people
- The styler package can be used to reformat your code with the tidyverse style guide
- Data Intake - i.e., getting data into R
- Native format for R is .rda or .RData
- saveRDS() writes files in this format
- readRDS() reads .rda or .RData files into R
- Data-table friendly formats
- CSV - comma separated values
- Software-package specific format - e.g., Octave, Stata, SPSS, Minitab, SAS, Epi, etc.
- Relational databases - see Chapter 15 for SQL
- Excel
- Web related - e.g., HTML, XML, JSON, Google Sheets, API
- R packages for reading data
- readxl - for Excel files
- googlesheets4 - for Google Sheets
- haven - for software packages
- dplyr and DBI - for relational databases
- readr and rvest - for web scraping - see TidyData-Read-from-web.R
- APIs - application programming interface
- twitteR
- aRxiv
- Rfacebook
- instaR
- FlickrAPI
- tumblR
- Rlinkedin
- RSocrata
- Data cleaning
- Transforming data in a variable to a useable form
- Recoding
- translate integers (or other abbreviations) to more informative codes
Activity 8
- Replicate the analysis in Example 6.4.4 on nuclear reactors in Japan.
Please complete your reading of Chapter 6 for class on Wednesday.