Groupwork exercise
Introduction
One of the key pledges that the current government of the United Kingdom made in 2019 was a commitment to levelling up – that is, to address local and regional inequalities. The idea of levelling up the country has been at the centre of the political debate for the last two years, involving a wide range of socio-economic topics from education22 to rail investment23.
The Centre for Cities (an independent charity and research centre) has identified24 health and education as two of the key areas which should be at the core of a levelling up agenda. In conducting this groupwork, you will focus on the relationship between health, education and occupation within Leicester. The aim is to understand better how health is linked to education and occupation at the local level, with the perspective of advising the local authority about key areas that might be prioritised for support.
Data
To explore the issues outlined above, this groupwork uses data from the 2011 Output Area Classification (2011 OAC) introduced in Chapter 3. The dataset includes public sector information licensed under the Open Government Licence v3.0 from the Office for National Statistics.
The 2011 OAC is a geodemographic classification of the census Output Areas (OA) of the UK, which was created by Gale et al. (2016) starting from an initial set of 167 prospective variables from the United Kingdom Census 2011: 86 were removed, 41 were retained as they are, and 40 were combined, leading to a final set of 60 variables. Gale et al. (2016) finally used the k-means clustering approach to create 8 clusters or supergroups (see map at datashine.org.uk), as well as 26 groups and 76 subgroups. The dataset in the file 2011_OAC_Raw_uVariables_Leicester.csv
contains all the original 167 variables, as well as the resulting groups, for the city of Leicester.
Instructions
Before continuing with the remainder of the groupwork, create a new project named Leicester_health_education and make sure it is activated.
Download from Blackboard (or see the data folder of the repository) the 2011_OAC_Raw_uVariables_Leicester.csv
file on your computer (and upload it to the RStudio Server if necessary, if not done already). The full variable names can be found in the file 2011_OAC_Raw_uVariables_Lookup.csv
. Write an RMarkdown document to be compiled into a PDF or HTML file presenting the answers to the questions listed below. You should present the answers in the same order as they are listed, each in a separate section of the document, including the code, the output and the textual component as required.
Part 1
Conduct an exploratory and comparative analysis of the variables listed in Table 1. Include the code, the output (can include graphics) and a description of the findings. The latter should be up to 500 words and it can be written as a final discussion after the analysis, or as a description of each step of the analysis, or a combination of the two.
VariableCode | VariableDescription |
---|---|
u104 | Day-to-day activities limited a lot or a little Standardised Illness Ratio |
u105 | Very good health |
u106 | Good health |
u107 | Fair health |
u108 | Bad health |
u109 | Very bad health |
u110 | Provides unpaid care |
u111 | No qualifications |
u112 | Highest level of qualification: Level 1, Level 2 or Apprenticeship |
u113 | Highest level of qualification: Level 3 qualifications |
u114 | Highest level of qualification: Level 4 qualifications and above |
u115 | Schoolchildren and full-time students: Age 16 and over |
u159 | Managers, directors and senior officials |
u160 | Professional occupations |
u161 | Associate professional and technical occupations |
u162 | Administrative and secretarial occupations |
u163 | Skilled trades occupations |
u164 | Caring, leisure and other service occupations |
u165 | Sales and customer service occupations |
u166 | Process, plant and machine operatives |
u167 | Elementary occupations |
Part 2
Select two among the variables explored in Part 1 (see Table 1) to create a robust (where possible), simple linear regression model. The model should have as outcome (dependent) variable an indicator of the health of the population and as predictors (independent) variables a variables related to occupation. The variables should be selected based on the outcome of the analysis done for Part 1, in order to ensure that the model is as strong and robust as possible.
\[health = occupation + error \]
Remember “correlation does not imply causation”. 😊
Part 3
Use the variables explored in Part 1 (see Table 1) to create a robust (where possible), multiple linear regression model. The model should have as outcome (dependent) variable an indicator of the health of the population. The indicator can be one of the variables explored in Part 1 (see Table 1) or a combination thereof. The model should have as predictors (independent) variables a relevant set of variables related to education and occupation.
\[health = (education + occupation) + error \]
Present the model that achieves the best fit and the process through which it has been identified. Include the code, the output (can include graphics), a discussion of the process and an interpretation of the final model. The latter two should be up to 500 words and it can be written as a final discussion after the analysis, or as a description of each step of the analysis, or a combination of the two.
Alternatively, if no robust model or no significant model can be created for Leicester, include the code and the output (can include graphics) that illustrate that finding, and a related discussion (still, up to 500 words). The latter could be written as a final discussion after the analysis, or as a description of each step of the analysis, or a combination of the two.
…again, remember “correlation does not imply causation”. 😊
by Stefano De Sabbata – text licensed under the CC BY-SA 4.0, contains public sector information licensed under the Open Government Licence v3.0, code licensed under the GNU GPL v3.0.