21 Join

21.1 Joining data

Data frames can be joined (or ‘merged’)

  • information from two data frames can be combined
  • specifying a column with common values
    • usually one with a unique identifier of an entity
  • rows having the same value are joined
  • depending on parameters
    • a row from one data frame can be merged with multiple rows from the other data frame
    • rows with no matching values in the other data frame can be retained
  • merge base function or join functions in dplyr

21.2 Join types


by C.L. Moffatt, licensed under The Code Project Open License (CPOL)

21.4 Example

Name Age Role
Maria 47 Professor
Pete 34 Researcher
Sarah 32 Researcher
Jo 25 Postgrad


Name City
Maria Barcelona
Pete London
Sarah Boston
Mel Los Angeles

21.5 dplyr::full_join

dplyr provides a series of join functions

  • full_join combines all the available data
Name Age Role City
Maria 47 Professor Barcelona
Pete 34 Researcher London
Sarah 32 Researcher Boston
Jo 25 Postgrad NA
Mel NA NA Los Angeles

21.6 dplyr::left_join

  • left_join keeps all the data from the left table
    • using %>%, that’s the data “coming down the pipe”
  • rows from the right table without a match are dropped
Name Age Role City
Maria 47 Professor Barcelona
Pete 34 Researcher London
Sarah 32 Researcher Boston
Jo 25 Postgrad NA

21.7 dplyr::right_join

  • right_join keeps all the data from the right table
    • using %>%, that’s the data provided as an argument
  • rows from the left table without a match are dropped
Name Age Role City
Maria 47 Professor Barcelona
Pete 34 Researcher London
Sarah 32 Researcher Boston
Mel NA NA Los Angeles

21.8 dplyr::inner_join

  • inner_join keeps only rows that have a match in both tables
  • rows without a match are dropped
Name Age Role City
Maria 47 Professor Barcelona
Pete 34 Researcher London
Sarah 32 Researcher Boston