22 Re-shape
22.1 Wide data
This is the most common approach
- each real-world entity is represented by one single row
- its attributes are represented through different columns
City | Population | Area | Density |
---|---|---|---|
Leicester | 329,839 | 73.3 | 4,500 |
Nottingham | 321,500 | 74.6 | 4,412 |
22.2 Long data
This is probably a less common approach, but still necessary in many cases
- each real-world entity is represented by multiple rows
- each one reporting only one of its attributes
- one column indicates which attribute each row represent
- another column is used to report the value
City | Attribute | Value |
---|---|---|
Leicester | Population | 329,839 |
Leicester | Area | 73.3 |
Leicester | Density | 4,500 |
Nottingham | Population | 321,500 |
Nottingham | Area | 74.6 |
Nottingham | Density | 4,412 |
22.3 tidyr
The tidyr
(pronounced tidy-er) library is part of tidyverse
and it provides functions to re-shape your data
city_info_wide <- data.frame(
City = c("Leicester", "Nottingham"),
Population = c(329839, 321500),
Area = c(73.3, 74.6),
Density = c(4500, 4412)
)
kable(city_info_wide)
City | Population | Area | Density |
---|---|---|---|
Leicester | 329839 | 73.3 | 4500 |
Nottingham | 321500 | 74.6 | 4412 |
22.4 tidyr::gather
Re-shape from wide to long format
city_info_long <- city_info_wide %>%
gather(
-City, # exclude city names from gathering
key = "Attribute", # name for the new key column
value = "Value" # name for the new value column
)
City | Attribute | Value |
---|---|---|
Leicester | Population | 329839.0 |
Nottingham | Population | 321500.0 |
Leicester | Area | 73.3 |
Nottingham | Area | 74.6 |
Leicester | Density | 4500.0 |
Nottingham | Density | 4412.0 |
22.5 tidyr::spread
Rre-shape from long to wide format
city_info_back_to_wide <- city_info_long %>%
spread(
key = "Attribute", # specify key column
value = "Value" # specify value column
)
City | Area | Density | Population |
---|---|---|---|
Leicester | 73.3 | 4500 | 329839 |
Nottingham | 74.6 | 4412 | 321500 |