22 Re-shape

22.1 Wide data

This is the most common approach

  • each real-world entity is represented by one single row
  • its attributes are represented through different columns
City Population Area Density
Leicester 329,839 73.3 4,500
Nottingham 321,500 74.6 4,412

22.2 Long data

This is probably a less common approach, but still necessary in many cases

  • each real-world entity is represented by multiple rows
    • each one reporting only one of its attributes
  • one column indicates which attribute each row represent
  • another column is used to report the value
City Attribute Value
Leicester Population 329,839
Leicester Area 73.3
Leicester Density 4,500
Nottingham Population 321,500
Nottingham Area 74.6
Nottingham Density 4,412

22.3 tidyr

The tidyr (pronounced tidy-er) library is part of tidyverse and it provides functions to re-shape your data

City Population Area Density
Leicester 329839 73.3 4500
Nottingham 321500 74.6 4412

22.4 tidyr::gather

Re-shape from wide to long format

City Attribute Value
Leicester Population 329839.0
Nottingham Population 321500.0
Leicester Area 73.3
Nottingham Area 74.6
Leicester Density 4500.0
Nottingham Density 4412.0

22.5 tidyr::spread

Rre-shape from long to wide format

City Area Density Population
Leicester 73.3 4500 329839
Nottingham 74.6 4412 321500