class: center, middle, inverse, title-slide # Digital urban geographies ## The quantitative, the qualitative
and the convolutional ### Stefano De Sabbata |
sdesabbata.github.io
###
2021-02-25
--- class: center, middle # the digital <br/> ***Information has always had geography***. *It is from somewhere; about somewhere; it evolves and is transformed somewhere; it is mediated by networks, infrastructures, and technologies: all of which exist in physical, material places.* .referencenote[ Graham, M., De Sabbata, S., and Zook, M. A. (2015) [Towards a study of information geographies: (im)mutable augmentations and a mapping of the geographies of information](https://rgs-ibg.onlinelibrary.wiley.com/action/showCitFormats?doi=10.1002%2Fgeo2.8). Geo: Geography and Environment, 2: 1, 88– 105, doi: 10.1002/geo2.8. ] <br/> *"It is now somehow obvious to state that the digital phenomena have radically transformed every aspect of human life. [...] **Digital platforms** are changing what constitutes **"the field"**: the rise of digital content comprises new forms of evidence with which to approach long-standing geographical concerns"* .referencenote[ Ash, J., et al. (2018). [Digital Geographies](https://uk.sagepub.com/en-gb/eur/digital-geographies/book258271), SAGE Publications. ] --- # Digital (urban) geographies .pull-left[ <br/> ### the quantitative - access - participation - representativeness - operationalisation {{content}} ] .pull-right[  ] -- ### the qualitative - everyday multiculture {{content}} -- ### the convolutional - Graph Convolutional Neural Networks - Econding spatio-temporal information --- # Special thanks to... .pull-left[ .large[ - [Dr Andrea Ballatore](https://aballatore.space/), Birkbeck, University of London - [Dr Katy Bennett](https://www2.le.ac.uk/departments/geography/people/kjb33), University of Leicester - [Dr Jonathan Bright](https://www.oii.ox.ac.uk/people/jonathan-bright/), Oxford Internet Institute, University of Oxford - [Dr Zoe Gardner](https://www2.le.ac.uk/departments/geography/people/dr-zoe-gardner), University of Leicester - [Prof Mark Graham](https://www.oii.ox.ac.uk/people/mark-graham/), Oxford Internet Institute, University of Oxford - [Pengyuan Liu](https://geography.digital/), University of Leicester and [University of Helsinki](https://www.helsinki.fi/en/people/people-finder/pengyuan-liu-9426324) ] ] .pull-right[  ] --- class: inverse, center, middle # the quantitative --- # Access Understanding **geographies of access and enablement** provides important insights into the distribution of technologies and services that are essential for digital communication, participation, and representation. <br/> .left-column-large[  ] .right-column-small[ <br/><br/><br/> .referencenote[ Graham, M., De Sabbata, S., and Zook, M. A. (2015) [Towards a study of information geographies: (im)mutable augmentations and a mapping of the geographies of information](https://rgs-ibg.onlinelibrary.wiley.com/action/showCitFormats?doi=10.1002%2Fgeo2.8). Geo: Geography and Environment, 2: 1, 88– 105, doi: 10.1002/geo2.8. ] ] --- # Participation Access to the internet is only one aspect of the complex network of factors that drive participation <br/> .left-column-large[  ] .right-column-small[ <br/><br/><br/> .referencenote[ Graham, M., De Sabbata, S., and Zook, M. A. (2015) [Towards a study of information geographies: (im)mutable augmentations and a mapping of the geographies of information](https://rgs-ibg.onlinelibrary.wiley.com/action/showCitFormats?doi=10.1002%2Fgeo2.8). Geo: Geography and Environment, 2: 1, 88– 105, doi: 10.1002/geo2.8. ] ] --- # Participation .pull-left[ Participation in knowledge production is also affected by non-geographic biases, which have an effect on geographic data #### OpenStreetMap - 95–98% of all contributions to OSM being produced by men - differences in modes of contributions between men and women ] .pull-right[  ] <br/> .referencenote[ Gardner, Z., Mooney, P., De Sabbata, S. et al. [Quantifying gendered participation in OpenStreetMap: responding to theories of female (under) representation in crowdsourced mapping](https://link.springer.com/article/10.1007/s10708-019-10035-z). GeoJournal 85, 1603–1620 (2020). doi: 10.1007/s10708-019-10035-z ] --- # Representativeness .pull-left[ .large[Representation similar biases as participation] - Higher qualifications strongest factor - Wealth (house prices) strong factor in both, more so for Wikipedia - Twitter strongly influenced by perc. of ppl. aged 30-44 (positively) and households with dependent children (negatively) - Models account only for about 44–55% of variability - Need for more explanatory factors, e.g., tourism-related activities - Ethnic composition is not a factor in the UK ] .pull-right[  .referencenote[ Ballatore A., De Sabbata S. (2018) [Charting the Geographies of Crowdsourced Information in Greater London](https://link.springer.com/chapter/10.1007/978-3-319-78208-9_8). In Technologies for All. AGILE 2018. Lecture Notes in Geoinformation and Cartography. Springer, Cham. doi: 10.1007/978-3-319-78208-9_8 ] ] --- # Representativeness .left-column-large[     ] .right-column-smal[ <br/> .referencenote[ Ballatore A., De Sabbata S. (2018) [Charting the Geographies of Crowdsourced Information in Greater London](https://link.springer.com/chapter/10.1007/978-3-319-78208-9_8). In Technologies for All. AGILE 2018. Lecture Notes in Geoinformation and Cartography. Springer, Cham. doi: 10.1007/978-3-319-78208-9_8 <br/> <br/> [London Output Area Classification](https://data.london.gov.uk/dataset/london-area-classification), see also: Singleton, A. D. and Longley, P. (2015). [The internal structure of Greater London: a comparison of national and regional geodemographic models](https://rgs-ibg.onlinelibrary.wiley.com/doi/full/10.1002/geo2.7). Geo: Geography and Environment, 2(1):69–87. doi: 10.1002/geo2.7 ] ] --- # Representativeness Twitter and Wikipedia similar but distinct geographies only representative of themselves .pull-left[  ] .pull-right[  ] .referencenote[ Ballatore A., De Sabbata S. (2018) [Charting the Geographies of Crowdsourced Information in Greater London](https://link.springer.com/chapter/10.1007/978-3-319-78208-9_8). In Technologies for All. AGILE 2018. Lecture Notes in Geoinformation and Cartography. Springer, Cham. doi: 10.1007/978-3-319-78208-9_8 ] --- # Representativeness .pull-left[ Comparing London and L.A., broadly similar, but each place and platform has its own idiosyncrasies - Affluence has seemingly opposite effects in London and L.A. - Ethnic composition has no explanatory power in London, while presence of white and Asian residents is associated with more data in L.A. - The 30–44 age group makes a clear contribution to data variability in London, but it is not a factor in L.A. - In London, the variability in Wikipedia is linked to up to 49% of that in Twitter, but only up to 6% in L.A. ] .pull-right[  ] .referencenote[ Ballatore A., De Sabbata S. (2018) [Charting the Geographies of Crowdsourced Information in Greater London](https://link.springer.com/chapter/10.1007/978-3-319-78208-9_8). In Technologies for All. AGILE 2018. Lecture Notes in Geoinformation and Cartography. Springer, Cham. doi: 10.1007/978-3-319-78208-9_8 ] --- # Operationalisation .pull-left[ Content created by users on digital platforms is biased and varying in quality - (how) can we use it for geographic research? - can we “exploit” the bias?  .referencenote[ Bright, J., De Sabbata, S., Lee, S., Ganesh, B. and Humphreys, D.K., 2018. [OpenStreetMap data for alcohol research: Reliability assessment and quality indicators](https://www.sciencedirect.com/science/article/pii/S1353829217305804). Health & place, 50, pp.130-136. doi: 10.1016/j.healthplace.2018.01.009 ] ] .pull-right[ .center[  ] ] --- class: inverse, center, middle # the qualitative --- # Mapping multiculture An *(on-going)* mixed-methods exploration of the digital geographies of Leicester .pull-left[  ] .pull-right[  ] --- # Mapping multiculture .pull-left[ <br/> An integrated approach to mixed qualitative and quantitative methods - Digital qualitative methods - Interviews - Qualitative social media analysis - Results from qualitative analysis as a base for quantitative social media analysis - A critical approach to quantitative analysis - A self-reflexive analysis of the process .referencenote[ See also: [Leverhulme Trust Newsletter, May 2019](https://www.leverhulme.ac.uk/sites/default/files/LT%20Newsletter%20May19%20Lo-res.pdf) *"Mapping multiculture: disrupting representations of an ethnically diverse city"* ] ] .pull-right[  ] --- class: inverse, center, middle # the convolutional --- # Deep learning in digital geographies When analysing social media data .left-column-smallish[ - **Qualitative** methods are nuanced but resource-intensive - Can only be reasonably applied to small samples - **Quantitative** approaches can be applied to vast amounts of data, but they are blunt instruments - Difficult to adapt to specific cases, areas and topics .referencenote[ Liu, P. and De Sabbata, S., 2021. [A graph-based semi-supervised approach to classification learning in digital geographies](https://www.sciencedirect.com/science/article/pii/S0198971520303161). Computers, Environment and Urban Systems, 86, p.101583. doi: 10.1016/j.compenvurbsys.2020.101583 ] ] .right-column-largish[  ] --- # Deep learning in digital geographies Can we combine the nuance of qualitative analysis with the scalability of quantitative analysis into a combined mixed-method approach? A semi-supervised neural network might be the way forward... .center[  ] .referencenote[ Liu, P. and De Sabbata, S., 2021. [A graph-based semi-supervised approach to classification learning in digital geographies](https://www.sciencedirect.com/science/article/pii/S0198971520303161). Computers, Environment and Urban Systems, 86, p.101583. doi: 10.1016/j.compenvurbsys.2020.101583 ] --- # Multimodal autoencoder Combine - image representations - similar to a Residual Neural Network (ResNet, see Mao et al., 2016) - text representations - Long Short-Term Memory Neural Network (LSTM) - to a combined representation - similar to a Correlational Neural Network (Corrnet, see Chandar et al., 2016) - minimise self-construction error - minimise cross-reconstruction error from image and texts - maximise correlation between hidden representations of both components `$$\mathcal{J}_{\mathcal{Z}} = \sum^{N}_{i=1}(L(z_{i},g(h(z_{i})))+L(z_{i},g(h(x_{i})))+L(z_{i},g(h(y_{i}))))-\lambda corr(h(X),h(Y))$$` `$$corr(h(X),h(Y)) = \frac{\sum^{N}_{i=1}(h(x_{i}-\overline{h(X)})(h(y_{i}-\overline{h(Y)}))}{\sqrt{(\sum^{N}_{i=1}(h(x_{i}-\overline{h(X)})^{2}(\sum^{N}_{i=1}(h(y_{i}-\overline{h(Y)})^{2}}}$$` --- # Graph Convolutional Network Node-level output: `\(Z = f(X,A) = \textit{softmax}(H^{(L)})\)` `\(X\)` is information from autoencoder for each post, `\(A\)` is graph adjacency matrix Layer-wise propagation rule for GCN: `\(H^{(L+1)} = \sigma(\hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}}H^{(L)}W^{(L)})\)` - `\(\hat{A}=A+I_N\)` and `\(I_N\)` is the identity matrix of `\(A\)` - `\(W^{(L)}\)` is the trainable weight matrix of `\(L\)`th layer of neural network `\(\hat{D}_{ii} = \sum_j\hat{A}_{ij}\)` - `\(\sigma(\cdot)\)` is a non-linear activation, using `\(ReLu(\cdot) = max(0,\cdot)\)` - `\(H^{(L)}\)` is the activation matrix for the `\(L\)`th layer - `\(H^{(0)}=X\)` - `\(H^{(L)}=\hat{A}ReLu(H^{(L-1)})W^{(L)}\)`. Cross-entropy error: `\(\mathcal{L}=-\sum_{l\in \mathcal{Y}_{L}}\sum_{f=1}^{F}\mathcal{Y}_{lf}\ln{Z_{lf}}\)` - `\(\mathcal{Y}_L\)` is the set of nodes that have labels. --- # Deep learning, spatio-temporally .pull-left[ <br/>  ] .pull-right[ <br/>  .referencenote[ Liu, P. and De Sabbata, S., 2021. [A graph-based semi-supervised approach to classification learning in digital geographies](https://www.sciencedirect.com/science/article/pii/S0198971520303161). Computers, Environment and Urban Systems, 86, p.101583. doi: 10.1016/j.compenvurbsys. 2020.101583 ] ] --- # Deep learning, spatio-temporally Results of the experiments using a Minimum Spanning Tree (3 km left, 4 km right) .pull-left[  .referencenote[ Liu, P. and De Sabbata, S., 2021. [A graph-based semi-supervised approach to classification learning in digital geographies](https://www.sciencedirect.com/science/article/pii/S0198971520303161). Computers, Environment and Urban Systems, 86, p.101583. doi: 10.1016/j.compenvurbsys.2020.101583 ] ] .pull-right[  ] --- # Deep learning, spatio-temporally Ultimately, our results illustrate the advantages (necessity?) of understanding geo-located social media posts as geographic events | Model input | Representation Extractor | Model | Accuracy | Micro-F1 Score | |--------------------------------------|--------------------------|-------------------------------------------|----------|----------------| | A-spatial with Images and Text | Multi-modal Autoencoder | SVM (no graph structure) | 15.87% | 9.13% | | A-spatial with Images and Text | Multi-modal Autoencoder | GCN (Cycle Graph) | 68.63% | 65.94% | | Spatial with Images and text | Multi-modal Autoencoder | GCN (Weighted MST (3 km)) | 73.57% | 72.89% | | Spatio-temporal with Images and text | Multi-modal Autoencoder | GCN (StN (temporally-weighted, 4 km)) | 78.98% | 76.72% | | Spatio-temporal with Images and text | Multi-modal Autoencoder | GCN (StN (distance-temp.-weighted, 4 km)) | 80.08% | 78.65% | --- class: inverse, center, middle # More on geography and deep learning --- # GeoConvolution Adapting the idea of a convolutional neural network to statistical analysis of area units - **GeoConvolution**: custom Lambda layer,weighted average of geographic neighbourhood - **GeoBatch**: geographic selection of batch .left-column-large[ .center[  ] ] .right-column-small[ .referencenote[ De Sabbata, S. and Liu, P., 2019. [Deep learning geodemographics with autoencoders and geographic convolution](https://agile-online.org/images/conference_2019/documents/short_papers/90_Upload_your_PDF_file.pdf). In Proceedings of the 22nd AGILE conference on Geographic Information Science, Limassol, Greece. ] ] --- # Reproducing the LOAC 2011 .left-column-smallish[ .center[   ] ] .right-column-largish[ <br/> - Chi-Square test clearly shows a significant association between the 2011 LOAC and the GCNN output, `\(X^2 (49) = 61881\)`, `\(p < 0.001\)`. - Similar squared Euclidean distance (SED) 2011 LOAC scored 0.6999, GCNN output scored 0.7005. <br/> .center[  ] ] --- class: bottom background-image: url(../assets/images/mina-catching-the-snow.png) background-size: cover # Thanks! <br/> <br/> <br/> <br/> <br/> Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr), and [R Markdown](https://rmarkdown.rstudio.com). --- # Contacts and acknowledgements <br/> .bottom[ .pull-left[ .large[Get in touch!] 👋😊 - Email me at [s.desabbata@le.ac.uk](mailto:s.desabbata@le.ac.uk) - [sdesabbata.github.io](http://sdesabbata.github.io/) is my website You can find me - [@maps4thought](https://twitter.com/maps4thought) on Twitter - [sdesabbata](https://github.com/sdesabbata) on GitHub - As well as on - [ResearchGate](https://www.researchgate.net/profile/Stefano-De-Sabbata) - [Academia.edu](https://leicester.academia.edu/StefanoDeSabbata) - [Google Scholar](https://scholar.google.com/citations?user=VcSXvCYAAAAJ&hl=en) - [LinkedIn](https://www.linkedin.com/in/stefanodesabbata/?originalSubdomain=uk) ] .pull-right[ .center[] .referencenote[ Images, maps and results included in these slides contain public sector information from Office for National Statistics and Ordnance Survey licensed under the [Open Government Licence v3.0](http://www.nationalarchives.gov.uk/doc/open-government-licence). Data from the [GH Archive](https://www.gharchive.org/) under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/), [OpenStreetMap](OpenStreetMap), under [ODbL](http://www.openstreetmap.org/copyright), [Twitter](https://twitter.com/) under the [Developer Agreement](https://developer.twitter.com/en/developer-terms/agreement), [Wikipedia](https://en.wikipedia.org/wiki/Main_Page) under [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/) and the the [World Bank Open Data](https://data.worldbank.org/) portal under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). Map tiles by [Stamen Design](http://maps.stamen.com/#toner/12/37.7706/-122.3782), under [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/). ] ] ]