Groupwork exercise

Introduction

One of the key pledges that the current government of the United Kingdom made in 2019 was a commitment to levelling up – that is, to address local and regional inequalities. The idea of levelling up the country has been at the centre of the political debate for the last two years, involving a wide range of socio-economic topics from education²² to rail investment²³.

The Centre for Cities (an independent charity and research centre) has identified²⁴ health and education as two of the key areas which should be at the core of a levelling up agenda. In conducting this groupwork, you will focus on the relationship between health, education and occupation within Leicester. The aim is to understand better how health is linked to education and occupation at the local level, with the perspective of advising the local authority about key areas that might be prioritised for support.

Data

To explore the issues outlined above, this groupwork uses data from the 2011 Output Area Classification (2011 OAC) introduced in Chapter 3. The dataset includes public sector information licensed under the Open Government Licence v3.0 from the Office for National Statistics.

The 2011 OAC is a geodemographic classification of the census Output Areas (OA) of the UK, which was created by Gale et al. (2016) starting from an initial set of 167 prospective variables from the United Kingdom Census 2011: 86 were removed, 41 were retained as they are, and 40 were combined, leading to a final set of 60 variables. Gale et al. (2016) finally used the k-means clustering approach to create 8 clusters or supergroups (see map at datashine.org.uk), as well as 26 groups and 76 subgroups. The dataset in the file 2011_OAC_Raw_uVariables_Leicester.csv contains all the original 167 variables, as well as the resulting groups, for the city of Leicester.

Instructions

Before continuing with the remainder of the groupwork, create a new project named Leicester_health_education and make sure it is activated.

Download from Blackboard (or see the data folder of the repository) the 2011_OAC_Raw_uVariables_Leicester.csv file on your computer (and upload it to the RStudio Server if necessary, if not done already). The full variable names can be found in the file 2011_OAC_Raw_uVariables_Lookup.csv. Write an RMarkdown document to be compiled into a PDF or HTML file presenting the answers to the questions listed below. You should present the answers in the same order as they are listed, each in a separate section of the document, including the code, the output and the textual component as required.

Part 1

Conduct an exploratory and comparative analysis of the variables listed in Table 1. Include the code, the output (can include graphics) and a description of the findings. The latter should be up to 500 words and it can be written as a final discussion after the analysis, or as a description of each step of the analysis, or a combination of the two.

Table 12.1: Variables to be used for Part 1 and 2 of this groupwork exercise
VariableCode	VariableDescription
u104	Day-to-day activities limited a lot or a little Standardised Illness Ratio
u105	Very good health
u106	Good health
u107	Fair health
u108	Bad health
u109	Very bad health
u110	Provides unpaid care
u111	No qualifications
u112	Highest level of qualification: Level 1, Level 2 or Apprenticeship
u113	Highest level of qualification: Level 3 qualifications
u114	Highest level of qualification: Level 4 qualifications and above
u115	Schoolchildren and full-time students: Age 16 and over
u159	Managers, directors and senior officials
u160	Professional occupations
u161	Associate professional and technical occupations
u162	Administrative and secretarial occupations
u163	Skilled trades occupations
u164	Caring, leisure and other service occupations
u165	Sales and customer service occupations
u166	Process, plant and machine operatives
u167	Elementary occupations

Part 2

Select two among the variables explored in Part 1 (see Table 1) to create a robust (where possible), simple linear regression model. The model should have as outcome (dependent) variable an indicator of the health of the population and as predictors (independent) variables a variables related to occupation. The variables should be selected based on the outcome of the analysis done for Part 1, in order to ensure that the model is as strong and robust as possible.

\[health = occupation + error \]

Remember “correlation does not imply causation”. 😊

Part 3

Use the variables explored in Part 1 (see Table 1) to create a robust (where possible), multiple linear regression model. The model should have as outcome (dependent) variable an indicator of the health of the population. The indicator can be one of the variables explored in Part 1 (see Table 1) or a combination thereof. The model should have as predictors (independent) variables a relevant set of variables related to education and occupation.

\[health = (education + occupation) + error \]

Present the model that achieves the best fit and the process through which it has been identified. Include the code, the output (can include graphics), a discussion of the process and an interpretation of the final model. The latter two should be up to 500 words and it can be written as a final discussion after the analysis, or as a description of each step of the analysis, or a combination of the two.

Alternatively, if no robust model or no significant model can be created for Leicester, include the code and the output (can include graphics) that illustrate that finding, and a related discussion (still, up to 500 words). The latter could be written as a final discussion after the analysis, or as a description of each step of the analysis, or a combination of the two.

…again, remember “correlation does not imply causation”. 😊

by Stefano De Sabbata – text licensed under the CC BY-SA 4.0, contains public sector information licensed under the Open Government Licence v3.0, code licensed under the GNU GPL v3.0.

Solutions

Lecture slides