9 Data frames
9.1 Data Frames
A data frame is equivalent to a named list where all elements are vectors of the same length.
employees <- data.frame(
Name = c("Maria", "Pete", "Sarah"),
Age = c(47, 34, 32),
Role = c("Professor", "Researcher", "Researcher"))
employees
## Name Age Role
## 1 Maria 47 Professor
## 2 Pete 34 Researcher
## 3 Sarah 32 Researcher
Data frames are the most common way to represent tabular data in R. Matrices and lists can be converted to data frames.
9.2 Selection
Selection is similar to vectors and lists.
## Name Age Role
## 1 Maria 47 Professor
## [1] Maria Pete Sarah
## Levels: Maria Pete Sarah
9.3 Selection
Selection is similar to vectors and lists.
## [1] Maria Pete Sarah
## Levels: Maria Pete Sarah
## [1] Maria
## Levels: Maria Pete Sarah
9.4 Value assignment
Values can be assigned to cells through filtering and <-
## Name Age Role
## 1 Maria 47 Professor
## 2 Pete 34 Researcher
## 3 Sarah 33 Researcher
9.5 Column processing
Operations can be performed on columns, and new columns created.
current_year <- as.integer(format(Sys.Date(), "%Y"))
employees$Year_of_birth <- current_year - employees$Age
employees
## Name Age Role Year_of_birth
## 1 Maria 47 Professor 1973
## 2 Pete 34 Researcher 1986
## 3 Sarah 33 Researcher 1987
9.6 tibble
A tibble is a modern reimagining of the data.frame within tidyverse
- they do less
- don’t change variable names or types
- don’t do partial matching
- complain more
- e.g. when a variable does not exist
This forces you to confront problems earlier, typically leading to cleaner, more expressive code.