Deep learning approaches in GIScience at the RGS AIC 2021

RGS-IBG Annual International Conference 2021

The organisers and the GIScience Research Group of the Royal Geographical Society with IBG are eager to invite you to attend the Deep learning approaches in GIScience sessions at the RGS-IBG Annual International Conference 2021.

Organisers

Dr Stef De Sabbata, University of Leicester
Dr Andrea Ballatore, Birkbeck, University of London
Dr James Haworth, University College London
Dr Godwin Yeboah, University of Warwick

Sessions

Deep learning approaches in GIScience (1)

Conference page: ##conf1132 Deep learning approaches in GIScience (1)

Remote Sensing Image Captioning with Continuous Output Neural Models
- Rita Ramos (Portugal - IST and INESC-ID, University of Lisbon), Bruno Martins (Portugal - Universidade de Lisboa)
- Abstract: The task of remote sensing image captioning has received significant attention, involving the generation of concise textual descriptions for input aerial images. Most previous generation methods are based on neural encoder-decoder models which employ a softmax function in the final layer of the decoder to generate discrete word tokens by sampling over a probability distribution [1,2]. These methods match the reference sentences word-by-word, thereby optimizing the generation locally at the token-level instead of globally at the sentence-level. We instead explored an alternative generation method based on continuous outputs [3], which employs a final embedding layer to generate sequences of word vectors instead of directly choosing/sampling word tokens. We argue that continuous output models have potential to capture the global semantic similarity between captions and images, by facilitating the use of loss functions that compare different views of the data beyond representations for individual tokens (i.e., direct comparisons against representations for the entire caption and/or the input image). Experimental results, over the widely used UCM and RSICD captioning datasets [2], showed that our encoder-decoder framework with continuous outputs can indeed improve performance over traditional generation based on discrete tokens, while being competitive with the current state-of-the-art model in the area [1]. Moreover, this approach can also be easily integrated into more advanced captioning models. However, our tests also revealed important limitations in the existing evaluation datasets (e.g., templated captions using a small vocabulary, and significant differences in term frequency distributions over the train/test/validation splits), which may hinder progress in the area.
Deep Learning for Population Mapping in Informal Settlements with High-Resolution Satellite Imagery and Equitable Ground-Truth
- Konstantin Klemmer (United Kingdom - University of Warwick), Godwin Yeboah (United Kingdom - University of Warwick), João Porto de Albuquerque (United Kingdom - University of Warwick), Stephen A. Jarvis (United Kingdom - University of Warwick)
- Abstract: We propose a generalizable framework for the population estimation of dense, informal settlements in low-income urban areas; so-called ‘slums’. These population estimates are particularly important for government authorities and NGO’s: informal settlements house some of the most vulnerable people who often urgently require external support. Leveraging high-resolution satellite imagery, our deep learning approach can support varying granularities up to the building-level. We evaluate our approach using a gridded population estimation model, enabling higher flexibility and customizable spatial resolution. At the core of our analysis lies an “equitable ground- truth” approach for the mapping data: Building-level shapes and labels are mapped and curated in collaboration with local universities and communities. Through this cooperative approach, residents take agency over mapping their own neighborhood, and we leverage their knowledge to generate high-quality spatial data [1]. Using data from four informal settlements, we test our method in a setting that allows us to comment on its generalizability. Due to the sparsity of the available data, we use pre-trained vision models (namely MobileNetV2 and ResNet50) and fine-tune them to fit our prediction task. Our models show promising preliminary results, even when faced with an informal settlement they haven’t seen before. This indicates that informal settlements have certain patterns shared amongst them, which can be exploited for tasks such as ours. While future work has to confirm and expand on this, our study provides a first step towards a global model for population estimation in informal settlements.
Maps and Machines: Using Computer Vision to Analyze the Geography of Industrialization (1780-1920)
- Kasra Hosseini (United Kingdom - The Alan Turing Institute), Katherine McDonough (United Kingdom - The Alan Turing Institute), Daniel van Strien (United Kingdom - The British Library), Olivia Vane (United Kingdom - The British Library), Kaspar Beelen (United Kingdom - The Alan Turing Institute), Daniel Wilson (United Kingdom - The Alan Turing Institute)
- Abstract: The Living with Machines project seeks to create new histories of the lived experience of industrialisation in nineteenth-century Great Britain. Because the vast archives of this period have been challenging to interrogate at scale, we use computational methods to explore newspapers, maps, census records, and more. Here, we introduce our deep learning framework for processing visual information from a corpus of 130k nineteenth-century Ordnance Survey (OS) maps. This approach to image classification, which we call the ‘patchwork method’, combines the affordances of data science and machine learning, with the advantages of close reading and critical attention to the features of historical cartography. After dividing map sheets into grids of ‘patches’, the output is a patch-level overview of how a computer vision model sees maps with respect to a given set of labels. We can therefore quantify the presence of a particular feature across an entire OS map series at whatever scale we set for the patch size. This flexibility allows us to specify a set of properties which are relational or abstract in character, potentially going beyond existing cartographic categories, but nonetheless containing distinctive visual patterns which allow new forms of socio-spatial analysis. For example, using this method, we ask: what percentage of OS maps contain rail infrastructure, or ‘railspace’? The ability to organise patch clusters according to, among other criteria, the intensity of railspace, provides a new perspective on an important index of industrialisation, and which creates a new typology of place based on shared spatial characteristics.
Using Computer Vision to Improve Cycle Safety
- Mohamed Ibrahim (United Kingdom - University College London), James Haworth (United Kingdom - University College London), Nicola Christie (United Kingdom - University College London), Tao Cheng (United Kingdom - University College London)
- Abstract: Increasing uptake of cycling as a transportation mode is a priority for many transport authorities due to its positive impact on health and the associated reduction in the use of polluting transport modes. However, the actual and perceived risks of cycling, particularly for commuting in cities, continues to be a barrier. In order to promote cycling as a viable transportation option, it is important to understand the risks for mitigation, design of safe infrastructure and education. Killed and Seriously Injured (KSI) events involving cyclists are thankfully rare. Much more frequent are near misses, which are events where a cyclist felt destabilised (e.g. by a close passing vehicle) or had to take action to prevent a crash. Recognising this, research efforts have focussed on studying near misses through various means, one of which is the naturalistic study. Naturalistic studies involve observing people as they complete routine activities. This is achieved by fitting various sensors to a bicycle or rider, including video cameras, GPS devices, range sensors and inertial measurement units. The information generated using these devices has great potential for understanding the risks that cyclists are exposed to. However, the large amounts of video data generated are usually processed and interpreted manually, which is time consuming and limits their use to small scale studies. In this research, we apply computer vision techniques to the automated analysis of video streams related to near misses. This allows us to extract various risk factors, including the number and type of agents present in the scene, weather and visual conditions and the actions of the agents involved. The purpose of this research is to enable automatic data generation for statistical analyses of cycling crash risk. Our future work involves deploying the algorithms in real time for proactive safety measures.
A deep learning approach towards identify unhealthy advertisements in street view images
- Gregory Palmer (Germany - Leibniz University), Mark Green (United Kingdom - University of Liverpool), Alex Singleton (United Kingdom - University of Liverpool), Rahul Savani (United Kingdom - University of Liverpool), Yales Stefano Rios Vasconcelos (United Kingdom - University of Liverpool), Emma Boyland (United Kingdom - University of Liverpool)
- Abstract: While outdoor advertisements are common features within towns and cities, they may reinforce social inequalities in health. Vulnerable populations in deprived areas may have greater exposure to fast food, gambling and alcohol advertisements encouraging their consumption. Understanding who is exposed and evaluating potential policy restrictions requires a substantial manual data collection effort. To address this problem we develop a deep learning workflow to automatically extract and classify unhealthy advertisements from street-level images. We introduce the Liverpool 360 Street View (LIV360SV) dataset for evaluating our workflow. The dataset contains 25,349, 360 degree, street-level images collected via cycling with a GoPro Fusion camera, recorded Jan 14th - 18th 2020. 10,106 advertisements were identified and classified as food (1335), alcohol (217), gambling (149) and other (8405) (e.g., cars and broadband). We find evidence of social inequalities with a larger proportion of food advertisements located within deprived areas and those frequented by students. Our project presents a novel implementation for the incidental classification of street view images for identifying unhealthy advertisements, providing a means through which to identify areas that can benefit from tougher advertisement restriction policies for tackling social inequalities.

Deep learning approaches in GIScience (2)

Conference page: ##conf1132 Deep learning approaches in GIScience (2)

Geospatial Data Disaggregation With Encoder-Decoder Convolutional Neural networks
- João Monteiro (Portugal - Universidade de Lisboa), Bruno Martins (Portugal - Universidade de Lisboa), Miguel Costa (Portugal - Vodafone), João Moura Pires (Portugal - Universidade Nova de Lisboa)
- Abstract: Demographic and socio-economic statistics are widely available on a variety of subjects. Still, the data are often collected or released for highly aggregated geospatial areas, masking important local hotspots. When conducting spatial analysis, one often needs to disaggregate the source data, transforming statistics for a set of source zones into values for a set of target zones, with different geometry and a higher spatial resolution. In this work, we report on a novel dasymetric disaggregation method that uses encoder-decoder convolutional neural networks similar to those used in image segmentation (i.e., models inspired by the popular U-Net), to combine different types of ancillary data when deriving the dasymetric weights. Model training constitutes a particular challenge, given that disaggregation tasks do not entail the direct use of supervision signals, in the form of training instances mapping the low-resolution aggregated data into the corresponding high-resolution representations. We propose to address the problem through self-training or co-training, iteratively refining initial estimates from seminal disaggregation heuristics by training a single model over progressively better estimates, or using the results of one model to support the training of another. We conducted experiments related to the disaggregation of socio-demographic variables collected for Continental Portugal, originally available for coarse-grained administrative divisions and into raster cells with a resolution of 200m. The results show that the proposed approaches outperform baseline methods, including other regression models to infer the dasymetric weights. Our experiments also highlight the impact of different training strategies, e.g. involving different loss functions and/or regularization schemes.
Creating a geodemographic classification for Greater London through spatial graph neural network
- Pengyuan Liu (United Kingdom - University of Leicester), Stef De Sabbata (United Kingdom - University of Leicester)
- Abstract: Geodemographic classifications have been a fundamental tool in quantitative geography and geocomputation since the 1970s, with a wide range of commercial and academic applications, from policy-making and advertisement to socio-demographic studies. However, while a variety of both commercial and open classification have been developed, the core methodological approaches used in creating them are still rooted in classic machine learning methods, such as k-means. For instance, both the nation-wide 2011 Output Area Classification (2011 OAC) for the United Kingdom and the 2011 London Output Area Classification (2011 LOAC) have been developed using a k-means approach. Despite the success of using k-means in understanding socio-demographic features within GIScience, we argue that such machine learning approach often neglects the underlying geographic patterns in data representing area objects. In the past decades, the use of deep neural network has had a transformative impact in the field of machine learning and artificial intelligence. Graph neural network is one type of deep learning approach, which can directly operate on the graph structure, and has the ability to aggregate values for nodes from their connected neighbours to learn a more spatial sensitive embeddings. It has the potential to be integrated in geodemographic classifications with an addition account of geographic patterns behind the data. In this research, we present the use of one type of graph neural network GraphSage to create geographically sensitive embeddings for the constructed spatial graph using 2011 London census data, and demonstrate the usefulness of our approach to create a new type of geodemographic classification.
Understanding how aerial image features extracted using CNNs capture socioeconomic characteristics
- Melanie Green (United Kingdom - University of Liverpool), Daniel Arribas-Bel (United Kingdom - University of Liverpool)
- Abstract: The use of convolutional neural networks (CNNs) has opened up new possibilities for analysing and understanding image data. However, they have been under-used in remote sensing, urban geography and particularly socioeconomic problems. Based on the idea that the appearance of an urban area is a reflection of the society that created it, remotely sensed satellite and aerial imagery has been used to investigate various social science problems, alongside or in place of traditional socioeconomic measures from surveys and censuses. These images at a high resolution contain a huge amount of detail on the built and natural environment, and their potential can be unlocked by applying CNNs. This study aims to investigate how features extracted from aerial imagery using CNNs can complement the understanding of the socioeconomic characteristics of an area, by comparing an image-based classification to socioeconomic characteristics. A pre-trained CNN was used to extract features from aerial imagery of Great Britain. These features were used to cluster Output Areas into groups with similar appearance, and these groups were compared to the Output Area Classification (OAC) geodemographic classification to understand how the image features capture socioeconomic differences between areas.
Deep learning for mapping and understanding urban scapes
- Karin Pfeffer (Netherlands - University Of Twente), Alireza Ajami (Netherlands - HERE Technologies), Henri Debray (Netherlands - German Aerospace Center - Center for Earth Observation), Christien Klaufus (Netherlands - University of Amsterdam), Monika Kuffer (Netherlands - University Of Twente), Claudio Persello (Netherlands - University Of Twente), Robbin-Jan Van Duijne (Netherlands - University of Amsterdam), Xujiayi Yang (Netherlands - University Of Twente)
- Abstract: A majority of Global South cities are facing rapid urbanisation. New human settlements are emerging on the urban peripheries, inner-city areas are densifying and death scapes are competing with living scapes. Both scapes often develop without planning guidance, often develop in close proximity, compete for spaces, or even have integrated development dynamics, causing serious concerns about environmental and health consequences. In order to better understand and address these pressing issues, detailed spatial information on urban (development) patterns is required, however, often absent or not up-to-date in Global South cities. Previous research showed the potential of satellite imagery to extract information on urban settlement patterns and dynamics for different moments in time using various methods, with deep learning methods outperforming the more conventional approaches. In this presentation, we demonstrate the potential of deep learning, specifically convolutional neural networks (CNNs), for mapping urban scapes in very high resolution imagery. We apply deep-learning based methods for three use cases, namely 1) mapping deprived settlements – living scapes (Bangalore, India), identification of graveyards – death scapes (Lima, Peru) and ‘hidden’ urbanisation dynamics – living scape (Baharia, India). We use different types of satellite imagery (Worldview 2, Pleiades, Quickbird) combined with conventional geo-spatial data (road maps, land use maps) and local knowledge. Based on these three use cases, we illustrate the capacity of our approach to learn complex features; draw lessons with regard to input data, training and accuracies achieved and reflect on potential societal and policy implications of obtained results.
A deep learning approach based on multisensory data for the official swiss land use/cover statistics
- Adrian Meyer (Switzerland - University of Applied Sciences and Arts Northwestern Switzerland FHNW), Natalie Lack (Switzerland - University of Applied Sciences and Arts Northwestern Switzerland FHNW), Denis Jordan (Switzerland - University of Applied Sciences and Arts Northwestern Switzerland FHNW)
- Abstract: The area statistics of Switzerland provided periodically every six years by the Federal Office for Statistics classifies land use (LU) and land cover (LC) based on a regular 100x100m grid of 4.2 million sample points. The arduous manual labelling process performed by experts relies mainly on aerial imagery but uses a catalogue of additional data to obtain high reliability within the 46 LU and 27 LC classes. This study investigates the potential of a multisensory data fusion combined with modern artificial intelligence (AI) algorithms such as deep convolutional networks (CNN) and Random Forests (RF) in order to provide an automatic LU/LC classification. Therefore, the AI algorithm combines spatiotemporal information from aerial RGB and FCIR orthophotos, Sentinel 2 derived multispectral time series indices, digital elevation data, vegetation canopy height models and cadastral information. RGB and FCIR imagery is processed by an Xception CNN architecture and complemented with the data from satellite and GIS in order to present subsequently the combined vector to an RF classifier. The accurately annotated sample points serve as ground truth in this challenging paradigm with a high number of LU/LC classes. The combined CNN-RF model achieved overall accuracies of 84% for LU and 89% for LC. Extensive classes such as coherent tree population, which so far required a lengthy manual process, are classified with high specific accuracies >90%. We conclude that a cascade of AI algorithms based on multisensory data has the potential to support the costly expert-based classification.

Abstract

In its broader definition, machine learning has long been part of GIScience and geocomputation approaches to data analysis. That is primarily due to research on unsupervised learning approaches to geographic data mining, such as geodemographic classification (Delmelle, 2016; Gale et al., 2016) and dimensionality reduction (Wolf and Knaap, 2019), and supervised methods of inference from a sample, such as spatial autocorrelation (Ord and Getis, 1995) and geographically weighted regression (Brunsdon et al., 1998; Yu et al., 2019). More recently, deep machine learning approaches had a transformative impact in a variety of fields. For instance, the introduction of AlexNet (Krizhevsky et al., 2012) was a watershed moment in image processing, demonstrating the effectiveness of the use of convolutional neural networks (CNN) in the field.

Deep machine learning approaches have been somewhat neglected in GIScience and quantitative human geography (Harris et al., 2017) until quite recently. However, there is a clear interest in exploring the applicability and effectiveness of deep learning to study geographic phenomena and growing literature in the field. To mention a few: Chen et al. (2018) have been exploring the use of CNNs to identify ground objects from satellite images; De Sabbata and Liu (2019) explored a geodemographic classification approach based on deep embedding clustering; Xu et al. (2017) proposed the use of deep autoencoders to perform quality assessment of building footprints for OpenStreetMap.

This session aims to be a forum to discuss advances, opportunities, and limits of the use of deep machine learning approaches in the field of GIScience, showcasing both applications of deep learning methods applied to geography – including both human and physical geography contexts – and geospatial extensions and variants of deep learning methods.

References

Brunsdon, C., Fotheringham, S. and Charlton, M. (1998). Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician), 47(3), pp.431-443.

Chen, J., Y. Zhou, A. Zipf and H. Fan (2018). Deep Learning from Multiple Crowds: A Case Study of Humanitarian Mapping. IEEE Transactions on Geoscience and Remote Sensing (TGRS). 1-10. https://doi.org/10.1109/TGRS.2018.2868748

De Sabbata, S. and Liu, P. (2019). Deep learning geodemographics with autoencoders and geographic convolution. In Proceedings of the 22nd AGILE conference on Geographic Information Science, Limassol, Greece.

Delmelle, E.C. (2016). Mapping the DNA of urban neighborhoods: clustering longitudinal sequences of neighborhood socioeconomic change. Annals of the American Association of Geographers, 106(1), pp.36-56.

Gale, C.G., Singleton, A., Bates, A.G. and Longley, P.A. (2016). Creating the 2011 area classification for output areas (2011 OAC). Journal of Spatial Information Science, 12, pp.1-27.

Harris, R., O’Sullivan, D., Gahegan, M., Charlton, M., Comber, L., Longley, P., Brunsdon, C., Malleson, N., Heppenstall, A., Singleton, A. and Arribas-Bel, D. (2017). More bark than bytes? Reflections on 21+ years of geocomputation. Environment and Planning B: Urban Analytics and City Science, 44(4), pp.598-617.

Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

Ord, J.K. and Getis, A. (1995). Local spatial autocorrelation statistics: distributional issues and an application. Geographical analysis, 27(4), pp.286-306.

Wolf, L. J., & Knaap, E. (2019). Learning Geographical Manifolds: A Kernel Trick for Geographical Machine Learning. SocArXiv.

Xu, Y., Chen, Z., Xie, Z. and Wu, L. (2017). Quality assessment of building footprint data using a deep autoencoder network. International Journal of Geographical Information Science, 31(10), pp.1929-1951.

Yu, H., Fotheringham, A.S., Li, Z., Oshan, T., Kang, W. and Wolf, L.J. (2019). Inference in multiscale geographically weighted regression. Geographical Analysis.