Lecture 203

class: center, middle, inverse, title-slide

.title[
# Lecture 203
]
.author[
### Dr Stefano De Sabbata<br /><small>School of Geography, Geology, and the Env., University of Leicester<br /><a href="https://github.com/sdesabbata/r-for-geographic-data-science" style="color: white">github.com/sdesabbata/r-for-geographic-data-science</a><br /><a href="mailto:s.desabbata@le.ac.uk" style="color: white">s.desabbata@le.ac.uk</a> | <a href="https://twitter.com/maps4thought" style="color: white">@maps4thought</a><br />text licensed under <a href="https://creativecommons.org/licenses/by-sa/4.0/" style="color: white">CC BY-SA 4.0</a>, code licensed under <a href="https://www.gnu.org/licenses/gpl-3.0.html" style="color: white">GNU GPL v3.0</a></small>
]

---

class: inverse, center, middle

# Comparing data

---
## Recap

.pull-left[

<br/>

**Prev**: Exploratory statistics

- Descriptive statistics
- Exploring assumptions
    - Normality
    - Skewness and kurtosis
    - Homogeneity of variance

**Today**: Comparing data

- Comparing distributions through mean
- Correlation analysis
- Variable transformation

<br/>

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-2-1.png)

]

---
## Comparing means

.pull-left[

When are (two) group means  different?

Independent T-test:

- null hypothesis
  - there is **no difference** between the groups
- if *p-value* (significance) below threshold (e.g., `0.05` or `0.01`)
  - **group means are different**
- assumptions
    - normally distributed values in groups
    - homogeneity of variance of values in groups
      - if groups have different sizes
    - independence of groups
      - e.g. different conditions of an experiment

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-3-1.png)

]

---
## General linear model

<br>

**General linear model**

- observation `$i$` can be predicted by a `$model$` (predictors)
- accounting for some amount of error

$$outcome_i = (model) + error_i $$

<br>

**Independent T-test** as a general linear model

- groups is the predictor (categorical variable)
- single observation value as group mean plus error

$$outcome_i = (group\ mean) + error_i $$

---
## Example: bill depth

.pull-left[

<br/>

Is the bill depth of *Adelie* and *Gentoo* penguins different?

1. Check assumptions
    1. Indipendent groups: ok
    2. normal distribution: check using Shapiro-Wilk test
    3. homogeneity of variance: not necessary
2. Run T-test
    1. `stats::t.test`

]
.pull-right[

Values are normally distributed for both groups

```r
penguins %>% filter(species == "Adelie") %>% 
  pull(bill_depth_mm) %>% shapiro.test()
```

```
## 
## 	Shapiro-Wilk normality test
## 
## data:  .
## W = 0.98467, p-value = 0.09249
```

```r
penguins %>% filter(species == "Gentoo") %>% 
  pull(bill_depth_mm) %>% shapiro.test()
```

```
## 
## 	Shapiro-Wilk normality test
## 
## data:  .
## W = 0.97609, p-value = 0.0277
```

]

---
## stats::t.test

.pull-left[

The test is significant, the group means are different

```r
penguins %>%
  filter(species %in% c("Adelie", "Gentoo")) %$%
  t.test(bill_depth_mm ~ species)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  bill_depth_mm by species
## t = 25.337, df = 271.98, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Adelie and group Gentoo is not equal to 0
## 95 percent confidence interval:
##  3.102837 3.625651
## sample estimates:
## mean in group Adelie mean in group Gentoo 
##             18.34636             14.98211
```

How to report:

- *t*(271.98) = 25.34, *p* < .01

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-7-1.png)

]

---
## Example: bill depth (2)

.pull-left[

Is the bill depth of *Adelie* and *Chinstrap* penguins different?

We already checked normality for *Adelie* and *Gentoo*. Are values for *Chinstrap* normally distributed?

```r
penguins %>% filter(species == "Chinstrap") %>% 
  pull(bill_depth_mm) %>% shapiro.test()
```

```
## 
## 	Shapiro-Wilk normality test
## 
## data:  .
## W = 0.97274, p-value = 0.1418
```

Values are normally distributed for all three groups

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-9-1.png)

]

---
## stats::t.test

.pull-left[

The test is not significant, the group means are not different

```r
penguins %>%
  filter(species %in% c("Adelie", "Chinstrap")) %$%
  t.test(bill_depth_mm ~ species)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  bill_depth_mm by species
## t = -0.43771, df = 137.75, p-value = 0.6623
## alternative hypothesis: true difference in means between group Adelie and group Chinstrap is not equal to 0
## 95 percent confidence interval:
##  -0.4095657  0.2611044
## sample estimates:
##    mean in group Adelie mean in group Chinstrap 
##                18.34636                18.42059
```

How to report:

- *t*(137.75) = -0.44, *p* = 0.66

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-12-1.png)

]

---
## ANalysis Of VAriance (ANOVA)

.pull-left[

ANOVA is similar to the T-tests, but more than two groups

- null hypothesis
  - **no difference** between the groups
- if *p-value* (significance) below threshold (e.g., `0.05` or `0.01`)
  - **group means are different**
- assumptions
    - normally distributed values in groups
        - especially if groups have different sizes
    - homogeneity of variance of values in groups
      - if groups have different sizes
    - independence of groups
      - e.g. different conditions of an experiment
      
]
.pull-right[

**General linear model**

- observation `$i$` can be predicted by a `$model$` (predictors)
- accounting for some amount of error

$$outcome_i = (model) + error_i $$

<br/>

**ANOVA** as a general linear model

- groups is the predictor (categorical variable)
- single observation value as group mean plus error

$$outcome_i = (group\ mean) + error_i $$

]

---
## Example: bill depth (3)

.pull-left[

<br/>

Is the bill depth different between all three species?

1. Check assumptions
    1. Indipendent groups: ok
    2. normal distribution: check using Shapiro-Wilk test
    3. homogeneity of variance: not necessary
2. Run ANOVA
    1. `stats::aov`

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-13-1.png)

]

---
## stats::aov

<br/>

.pull-left[

The test is significant, the group means are different

```r
penguins %$%
  aov(bill_depth_mm ~ species) %>%
  summary()
```

```
##              Df Sum Sq Mean Sq F value Pr(>F)    
## species       2  904.0   452.0   359.8 <2e-16 ***
## Residuals   339  425.9     1.3                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
```

How to report:

- *F*(2, 339) = 359.79, *p* < .01

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-16-1.png)

]

---
class: inverse, center, middle

# Correlation

---
## Correlation

<br/>

**Correlation** is a standardised measure of covariance

.pull-left[

Two continuous variables can be

- not related at all
- related
    - positively: 
      - entities with *high values* in one 
      - tend to have *high values* in the other
    - negatively: 
      - entities with *high values* in one 
      - tend to have *low values* in the other

]
.pull-right[

Three different approaches

- **Pearson's r**
  - if two variables are **normally distributed**
- **Spearman’s rho**
  - if two variables are **not normally distributed**
- **Kendall’s tau**
  - if **not normally distributed** 
  - and there are a **large number of ties**

]

---
## Example

Are flipper length and body mass related in Chinstrap penguins?

.center[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-17-1.png)

]

---
## Pearson’s r

.pull-left[

<br/>

If  two variables are **normally distributed**, use **Pearson's r**

- null hypothesis
  - there is no relationship between the variables
- assumptions
  - variables are normally distributed

The square of the correlation value indicates the percentage of shared variance

]
.pull-right[

Flipper length and body mass are normally distributed in Chinstrap penguins

```r
penguins %>%
  filter(species == "Chinstrap") %>%
  pull(flipper_length_mm) %>% shapiro.test()
```

```
## 
## 	Shapiro-Wilk normality test
## 
## data:  .
## W = 0.98891, p-value = 0.8106
```

```r
penguins %>%
  filter(species == "Chinstrap") %>%
  pull(body_mass_g) %>% shapiro.test()
```

```
## 
## 	Shapiro-Wilk normality test
## 
## data:  .
## W = 0.98449, p-value = 0.5605
```

]

---
## stats::cor.test

.pull-left[

<br/>

If  two variables are **normally distributed**, use **Pearson's r**

- null hypothesis
  - there is no relationship between the variables
- assumptions
  - variables are normally distributed

The square of the correlation value indicates the percentage of shared variance

]
.pull-right[

```r
penguins %>%
  filter(species == "Chinstrap") %$%
  cor.test(flipper_length_mm, body_mass_g)
```

```
## 
## 	Pearson's product-moment correlation
## 
## data:  flipper_length_mm and body_mass_g
## t = 6.7947, df = 66, p-value = 3.748e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4759352 0.7632368
## sample estimates:
##       cor 
## 0.6415594
```

- Flipper length and body mass are related
  - `p-value < 0.01`
- sharing 41.2% of variance
  - 0.642 `^ 2 =` 0.412

]

---
## Example: flipper length and body mass

.pull-left[

But, are flipper length and body mass related in penguins (without considering species as separated groups)?

.center[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-21-1.png)

]

]
.pull-right[

Flipper length and body mass are not normally distributed when all penguins are taken into account as a single group

```r
penguins %>%
  pull(flipper_length_mm) %>% shapiro.test()
```

```
## 
## 	Shapiro-Wilk normality test
## 
## data:  .
## W = 0.95155, p-value = 3.54e-09
```

```r
penguins %>%
  pull(body_mass_g) %>% shapiro.test()
```

```
## 
## 	Shapiro-Wilk normality test
## 
## data:  .
## W = 0.95921, p-value = 3.679e-08
```

]

---
## Spearman’s rho

.pull-left[

If  two variables are **not normally distrib.**, use **Spearman’s rho**

- null hypothesis
  - there is no relationship between the variables
- non-parametric
  - uses full dataset to calculate the statistics
  - rather than estimate key parameters of distributions from data
- based on rank difference
- assumptions
  - ties are uncommon

The square of the correlation value indicates the percentage of shared variance

]
.pull-right[

```r
penguins %$%
  cor.test(
    flipper_length_mm, body_mass_g, 
    method = "spearman"
  )
```

```
## 
## 	Spearman's rank correlation rho
## 
## data:  flipper_length_mm and body_mass_g
## S = 1066875, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.8399741
```

- Flipper length and body mass are related
  - `p-value < 0.01`
- sharing 70.6% of variance
  - 0.84 `^ 2 =` 0.706

]

---
## Correlation with ties

Spearman's rho

- `Cannot compute exact p-value with ties`

```r
penguins %$%
  cor.test(flipper_length_mm, body_mass_g, method = "spearman")
```

```r
## Warning in cor.test.default(flipper_length_mm, body_mass_g, method = "spearman"):
##  Cannot compute exact p-value with ties
```

---
## Kendall’s tau

.pull-left[

If  two variables are **not normally distributed** and there are **many ties**, use **Kendall’s tau**

- null hypothesis
  - there is no relationship between the variables
- non-parametric
- based on rank difference
- no assumptions
- less *powerful*
  - even if there is a relationship, significance might be high

The square of the correlation value indicates the percentage of shared variance

]
.pull-right[

```r
penguins %$%
  cor.test(
    flipper_length_mm, body_mass_g, 
    method = "kendall"
  )
```

```
## 
## 	Kendall's rank correlation tau
## 
## data:  flipper_length_mm and body_mass_g
## z = 17.898, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.6604675
```

- Flipper length and body mass are related
  - `p-value < 0.01`
- sharing 43.6% of variance
  - 0.66 `^ 2 =` 0.436

]

---
## psych::pairs.panels

.pull-left[

Combining:

- histograms
- scatter plots 
- correlations

```r
library(psych)

penguins %>%
  select(
    bill_length_mm,
    bill_depth_mm,
    flipper_length_mm, 
    body_mass_g
  ) %>%
  pairs.panels(
    method = "kendall",
    stars = TRUE
  )

## Signif.: 0 '***' 0.001 '**' 0.01
## 0.01 '*' 0.05 0.05 '.' 0.1 ' ' 1
```

]
.pull-right[

.center[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-31-1.png)

]

---
## GGally::ggpairs

More flexible and sophisticated pairs plots

.pull-left[

```r
library(GGally)

penguins %>%
  select(
    bill_length_mm, bill_depth_mm,
    flipper_length_mm, body_mass_g, species
  ) %>%
  ggpairs(
    mapping = aes(color = species),
    upper = list(continuous = 
        wrap(ggally_cor, method = "kendall")),
    lower = list(continuous = 
        wrap("points", alpha = 0.3, size=0.1))
  ) +
  scale_color_manual(values = 
    c("darkorange","darkorchid","cyan4")) +
  scale_fill_manual(values = 
    c("darkorange","darkorchid","cyan4")) +
  theme_bw()

## Signif.: 0 '***' 0.001 '**' 0.01
## 0.01 '*' 0.05 0.05 '.' 0.1 ' ' 1
```

]
.pull-right[

.center[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-33-1.png)

]

---
## Chi-squre

.pull-left[

How to test a relationship between two **categorical** variables?

Chi-square test:

- null hypothesis
  - no relationship between variables
- non-parametric
- based on cross-tabulated expected counts
- no assumptions

```r
library(gmodels)

penguins %$%
  CrossTable(
    island, species, chisq = TRUE, 
    expected = TRUE, prop.c = FALSE, 
    prop.t = FALSE, prop.chisq = FALSE, 
    sresid = TRUE, format = "SPSS"
  )
```

]
.pull-right[

.small[

```
## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |         Expected Values |
## |             Row Percent |
## |            Std Residual |
## |-------------------------|
## 
## Total Observations in Table:  344 
## 
##              | species 
##       island |    Adelie  | Chinstrap  |    Gentoo  | Row Total | 
## -------------|-----------|-----------|-----------|-----------|
##       Biscoe |       44  |        0  |      124  |      168  | 
##              |   74.233  |   33.209  |   60.558  |           | 
##              |   26.190% |    0.000% |   73.810% |   48.837% | 
##              |   -3.509  |   -5.763  |    8.152  |           | 
## -------------|-----------|-----------|-----------|-----------|
##        Dream |       56  |       68  |        0  |      124  | 
##              |   54.791  |   24.512  |   44.698  |           | 
##              |   45.161% |   54.839% |    0.000% |   36.047% | 
##              |    0.163  |    8.784  |   -6.686  |           | 
## -------------|-----------|-----------|-----------|-----------|
##    Torgersen |       52  |        0  |        0  |       52  | 
##              |   22.977  |   10.279  |   18.744  |           | 
##              |  100.000% |    0.000% |    0.000% |   15.116% | 
##              |    6.055  |   -3.206  |   -4.329  |           | 
## -------------|-----------|-----------|-----------|-----------|
## Column Total |      152  |       68  |      124  |      344  | 
## -------------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  299.5503     d.f. =  4     p =  1.354574e-63 
## 
## 
##  
##        Minimum expected frequency: 10.27907
```

]

???

Different islands do have different amounts of penguins from different species

---
class: inverse, center, middle

# Data transformation

---
## Z-scores

.pull-left[

Transform the values as relative to

- distribution's mean `$\mu$`
- standard deviation `$\sigma$`

$$ z_i = \frac{x_i - \mu}{\sigma} $$

]
.pull-right[

```r
penguins %>%
  mutate(
    flipper_length_zscore = 
      scale(flipper_length_mm)
  ) %>%
  ggplot(aes(x = flipper_length_zscore)) +
  geom_histogram() + 
  theme_bw()
```

]

.pull-left[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-37-1.png)

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-38-1.png)

]

???

Same distribution (bar some aggregation in the histogram)

Commonly used to render two variables easier to compare

---
## Common transformations

.pull-left[

**Logarithmic** transformations are useful to *"un-skew"* variables

Common approaches include:

- natural logarithm (`log`)
- binary logarithm (`log2`)
- logarithm base 10 (`log10`)

Only possible on values `> 0`

]
.pull-right[

**Inverse hyperbolic sine** transformations are useful to *"un-skew"* variables

- similar to logarithmic transformations
- defined on all values
- in R: `asinh`

]

.center[
Not always sufficient to *"un-skew"* variables
]

---
## Example: residents aged 20 to 24

.pull-left[

The number of residents aged 20 to 24 (`u011`) in the areas of Leicester described as *"Cosmopolitans"* by the [2011 Output Area Classification](https://www.gov.uk/government/statistics/2011-area-classification-for-super-output-areas) is skewed

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-41-1.png)

]
.pull-right[

```r
leicester_2011OAC %>% 
  filter(supgrpname == "Cosmopolitans") %>% 
  select(u011) %>% 
  stat.desc(
    basic = FALSE, desc = FALSE, 
    norm = TRUE
  ) %>%
  kable("html", digits = 3)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> u011 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> skewness </td>
   <td style="text-align:right;"> 1.521 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> skew.2SE </td>
   <td style="text-align:right;"> 2.879 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> kurtosis </td>
   <td style="text-align:right;"> 2.089 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> kurt.2SE </td>
   <td style="text-align:right;"> 1.999 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> normtest.W </td>
   <td style="text-align:right;"> 0.847 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> normtest.p </td>
   <td style="text-align:right;"> 0.000 </td>
  </tr>
</tbody>
</table>

]

---
## Example: log10

.pull-left[

However, it's logarithm base 10 is normally distributed, thus it can be used with tests requiring normally distributed values

```r
mutate(log10_u011 = log10(u011))
```

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-44-1.png)

]
.pull-right[

```r
leicester_2011OAC %>% 
  filter(supgrpname == "Cosmopolitans") %>%
  mutate(log10_u011 = log10(u011)) %>% 
  select(log10_u011) %>% 
  stat.desc(
    basic = FALSE, desc = FALSE, 
    norm = TRUE
  ) %>%
  kable("html", digits = 3)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> log10_u011 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> skewness </td>
   <td style="text-align:right;"> -0.504 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> skew.2SE </td>
   <td style="text-align:right;"> -0.953 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> kurtosis </td>
   <td style="text-align:right;"> 0.872 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> kurt.2SE </td>
   <td style="text-align:right;"> 0.834 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> normtest.W </td>
   <td style="text-align:right;"> 0.976 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> normtest.p </td>
   <td style="text-align:right;"> 0.118 </td>
  </tr>
</tbody>
</table>

]

---
## Example: asinh

.pull-left[

The Inverse hyperbolic sine is also normally distributed

```r
mutate(ihs_u011 = asinh(u011))
```

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-47-1.png)

]
.pull-right[

```r
leicester_2011OAC %>% 
  filter(supgrpname == "Cosmopolitans") %>%
  mutate(ihs_u011 = asinh(u011)) %>%
  select(ihs_u011) %>% 
  stat.desc(
    basic = FALSE, desc = FALSE, 
    norm = TRUE
  ) %>%
  kable("html", digits = 3)
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> ihs_u011 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> skewness </td>
   <td style="text-align:right;"> -0.500 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> skew.2SE </td>
   <td style="text-align:right;"> -0.947 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> kurtosis </td>
   <td style="text-align:right;"> 0.860 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> kurt.2SE </td>
   <td style="text-align:right;"> 0.823 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> normtest.W </td>
   <td style="text-align:right;"> 0.976 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> normtest.p </td>
   <td style="text-align:right;"> 0.122 </td>
  </tr>
</tbody>
</table>

]

---
## Summary

.pull-left[

<br/>

**Today**: Comparing data

- Comparing distributions through mean
- Correlation analysis
- Variable transformation

**Next time**: Regression analysis

- Simple regression
- Multiple regression
- Comparing models

<br/>

.referencenote[
Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr), and [R Markdown](https://rmarkdown.rstudio.com).
]

]
.pull-right[

![](data:image/png;base64,#/home/rstudio/r-for-geographic-data-science/docs/slides/203-slides-comparing-data_files/figure-html/unnamed-chunk-49-1.png)

]