10/26/2020

Comments on Minute Cards

Thank you for taking the time to respond and share your opinions. I know that you are very busy and this is a very unusual semester. Feel free to share your views any time - we are all working through this together.
Textbook - I know that you don't like the text. I am thinking very seriously about incorportating a few chapters from another text over the next few weeks. There will be no cost to you because this text is online.
Pacing of the course - I know that the pace of work has been "uneven" and that was intentional. The class time for group presentations was intended to give you time to pull things together and examine code that your peers are creating.
Probability - This is not a probability course. The only reason I started with RVs and probability distributions is because they are so useful for simulation projects that we are building towards. If you are interested in learning many more details about these distributions, please consider taking Math 336. However that course will never be a prerequisite for this course because calculus is needed to examine continuous RVs and distributions.
Problem Sessions - I need to do a better job keeping quiet during these sessions. These sessions are not intended to be solution sessions. When you work together and help one another that is an ideal session. That is, it is OK to make mistakes and then others can chime in and help.

Chapter 6 - Sophisticated Data Structures

data frames - like a matrix, but can contain different object modes in different columns
- Recall that we first talked about reading data into R in Chapter 4 (see ReadingData.R)
lists - general data storage object
factor - special kind of variable that is used to represent categorical objects (ordinal or qualitative)
- factor(x) - creates a factor
  - labels option can be used to change the name of levels
- is.facator(x) - checks to see if object is a factor
- table(x) - tallies individaul values
Be careful
- R represents factors internally as integers
- Subsetting of factors can produce surprises
- Tip: Run a test to make sure that you know what you are getting before plowing ahead
See DataStructures.R for a series of examples from the textbook and often deal with Upper Flat Creek at the University of Idaho Experimental Forest
Some useful commands for data frames
- head(x) - shows the first 6 lines of the data frame
- tail(x) - shows the last 6 lines of the data frame
- [[ufc$height.m]] - extracts a column from a data frame (there are other ways to do this
- ufc[1:5, 5] - extracts rows 1 to 5 from column 5
- A data frame can be created from a collection of vectors
  - e.g., data.frame(col1=x1, col2=x2, df1, df2)
- New variables can be added to a data frame by naming them and assigning values to them
  - e.g., ufc$volume.m3<-pi*(ufc$dbh.m/200)^2*ufc$height.m/2
- Names(x) - provides the names in a data frame
Be careful:
- Using [ ] results in a new data frame, which can result in confusion, especially when you extract only one variable.
- Using %in% operator can be helpful but again, make sure you know what you are getting
- Extreme caution is needed when using the attach() function. I do not recommend using this function.
Some brief remarks about lists, which I don't use very often
- list() - creates a list
- [[ ]] - indicates list elements
- [ ] - indicates vector element within a list
- Element of lists do NOT have to be named, but they can be.
- unlist() - flattens a list
- Many functions (e.g., lm() which is used for regression) produce list objects as their output
Apply family of functions can be EXTREMELY helpful
- tapply(X, INDEX, FUN) - applies the function FUN to the target vector X using categories in INDEX
- lapply(X, INDEX, FUN) - applies the function FUN to the target vector X using categories in INDEX and returns a list
- sapply(X, INDEX, FUN) - applies the function FUN to the target vector X using categories in INDEX and tries to return a vector or matrix
See Data Structures.R for lots of examples

Breakout room activity

Read in treegrowth.csv
Create a larger list called trees, as described on p. 105
Verify that the structure of your first two trees matches the output on p. 106
Use the split() function to store each tree as a data frame.
Create a plot that includes a curve of height versus age for each tree.

Please read Chapter 6 and the article "Use of R as a Toolbox for Mathematical Statistics Exploration" (see horton-tas in our !ClassNotes folder on Google Drive) for class on Wednesday. We will have a problem session on Friday.