3 Control structures and Functions

Stefano De Sabbata

This work is licensed under the GNU General Public License v3.0. Contains public sector information licensed under the Open Government Licence v3.0.

3.1 Conditional structures

Conditional structures are fundamental in (procedural) programming, as they allow to execute or not execute part of a procedure depending on whether a certain condition is true. The condition is tested and the part of the procedure to execute in the case the condition is true is included in a code block.

A simple conditional structure can be created using if as in the example above. A more complex structure can be created using both if and else, to provide not only a procedure to execute in case the condition is true, but also an alternative procedure, to be executed when the condition is false.

## Today is not warm

Finally, conditional structures can be nested. That is, a conditional structure can be included as part of the code block to be executed after the condition is tested. For instance, in the example below, a second conditional structure is included in the code block to be executed in the case the condition is false.

## This is really cold!

3.2 Loops

Loops are another core component of (procedural) programming and implement the idea of solving a problem or executing a task by performing the same set of steps a number of times. There are two main kinds of loops in R - deterministic and conditional loops. The former is executed a fixed number of times, specified at the beginning of the loop. The latter is executed until a specific condition is met. Both deterministic and conditional loops are extremely important in working with vectors.

3.2.1 Conditional Loops

In R, conditional loops can be implemented using while and repeat. The difference between the two is mostly syntactical: the first tests the condition first and then execute the related code block if the condition is true; the second executes the code block until a break command is given (usually through a conditional statement).

## 0 
## 1
## 0 
## 1

3.2.2 Deterministic Loops

The deterministic loop executes the subsequent code block iterating through the elements of a provided vector. During each iteration (i.e., execution of the code block), the current element of the vector ( in the definition below) is assigned to the variable in the statement ( in the definition below), and it can be used in the code block.

It is, for instance, possible to iterate over a vector and print each of its elements.

## Derby 
## Leicester 
## Lincoln 
## Nottingham

It is common practice to create a vector of integers on the spot (e.g., using the : operator) to execute a certain sequence of steps a pre-defined number of times.

## Exectuion number 1 :
##     Step1: Hi!
##     Step2: How is it going?
## Exectuion number 2 :
##     Step1: Hi!
##     Step2: How is it going?
## Exectuion number 3 :
##     Step1: Hi!
##     Step2: How is it going?

3.3 Function definition

Recall from the first lecture that an algorithm or effective procedure is a mechanical rule, or automatic method, or programme for performing some mathematical operation (Cutland, 1980). A program is a specific set of instructions that implement an abstract algorithm. The definition of an algorithm (and thus a program) can consist of one or more functions, which are sets of instructions that perform a task, possibly using an input, possibly returning an output value.

The code below is a simple function with one parameter. The function simply calculates the square root of a number.

Functions can be defined by typing the definition in the Console in RStudio. However, entering functions from the command line is not always very convenient. If you make a typing error in an early line of the definition, it isn’t possible to go back and correct it. You would have to type in the definition every time you used R A more sensible approach is to type the function definition into an R script.

Create a new R project for this practical, named Practical_111. Create a new R script named functions_Practical_111.R. Copy the definition of cube_root in the R script, and save the file. If you execute the script, the R interpreter creates the new function from its definition, which should then be visible in the Environment tab in RStudio.

If you type the instruction below in the Console, the function is called using 27 as an argument, thus returning 3.

## [1] 3

It is furthermore possible to load the function(s) defined in one script from another script – in a fashion similar to when a library is loaded. Create a new R script as part of the Practical_111 project, named main_Practical_111.R and copy the code below in that second R script and save the file.

Executing the main_Practical_111.R instructs the interpreter first to run the functions_Practical_111.R script, thus creating the cube_root function, and then invoke the function using 27 as an argument, thus returning again 3. That is a simple example, but this can be an extremely powerful tool to create your own library of functions to be used by different scripts.

3.4 Exercise 6.1

Extend the code in the script functions_Practical_111.R to include the code necessary to solve the questions below.

Question 6.1.1: Write a function that calculates the areas of a circle, taking the radius as the first parameter.

Question 6.1.2: Write a function that calculates the volume of a cylinder, taking the radius of the base as the first parameter and the height as the second parameter. The function should call the function defined above and multiply the returned value by the height to calculate the result.

Question 6.1.3: Write a function with two parameters, a vector of numbers and a vector of characters (text). The function should check that the input has the correct data type. If all the numbers in the first vector are greater than zero, return the elements of the second vector from the first to the length of the first vector.

3.5 Data Checking

One issue when writing functions is making sure that the data that has been given to the data is the right kind. For example, what happens when you try to compute the cube root of a negative number?

## [1] NaN

That probably wasn’t the answer you wanted. As you might remember NaN (Not a Number) is the value return when a mathematical expression is numerically indeterminate. In this case, this is actually due to a shortcoming with the ^ operator in R, which only works for positive base values. In fact -7 is a perfectly valid cube root of -343, since (-7)x(-7)x(-7) = -343.

To work around this limitation, we can state a conditional rule:

  • If x < 0: calculate the cube root of x ‘normally’.
  • Otherwise: work out the cube root of the positive number, then change it to negative.

Those kind of situations can be dealt with in an R function by using an if statement, as shown below. Note how the operator - (i.e., the symbol minus) is here used to obtain the inverse of a number, in the same way as -1 is the inverse of the number 1.

Edit the code in functions_Practical_111.R accordingly and test the new function using the two commands listed below from the RStudio Console.

## [1] 7
## [1] NaN

However, other things can go wrong. For example, cube_root("Leicester") would cause an error to occur, Error in x^(1 / 3) : non-numeric argument to binary operator. That shouldn’t be surprising because cube roots only make sense for numbers, not character variables. Thus, it might be helpful if the cube root function could spot this and print a warning explaining the problem, rather than just crashing with a fairly cryptic error message such as the one above, as it does at the moment.

The function could be re-written to making use of is.numeric in a second conditional statement. If the input value is not numeric, the function returns the value NA (Not Available) instead of a number. Note that here there is an if statement inside another if statement, as it is always possible to nest code blocks – and if within a for within a while within an if within … etc.

Finally, cat is a printing function, that instructs R to display the provided argument (in this case, the phrase within quotes) as output in the console. The \n in cat tells R to add a newline when printing out the warning.

3.6 Exercise 6.2

Create a new R script as part of Project_06, named Data_Wrangling_with_Functions.R. Copy from the script Data_Wrangling_Example.R created in Project_03 the first part that included loading both datasets, the part that created the tibble leicester_IMD2015_decile_wide and the part that left-joined it with leicester_2011OAC to create leicester_2011OAC_IMD2015.

Add the following snippet of code that uses the pull from the dplyr library to extract the column supgrpname from leicester_2011OAC as a vector, and the function unique to extract the unique values from the vector. That effectively creates the vector leicester_2011OAC_supergroups listing all the names of the supergroups.

Extend the code in the script Data_Wrangling_with_Functions.R to include the code necessary to solve the questions below – which as you might notice are a variation on the questions seen in Practical 3.

Question 6.2.1: Write a piece of code that loops over the supergroups names in leicester_2011OAC_supergroups, and for each one of those generates a table showing the percentage of EU citizens over total population, calculated grouping OAs by the related decile of the Index of Multiple Deprivations. Tip: use the print function at the end of the pipe that generates the table to print each table.

Question 6.2.2: Write a piece of code that loops over the supergroups names in leicester_2011OAC_supergroups, and for each one of those calculates the overall percentage of EU citizens over total population, and if that percentage is over 5%, then it prints the name of the supergroup. Tip: use pull at the end of the pipe to extract the calculated percentage.

Question 6.2.3: Write a functions named median_index with one input parameter vector_of_numbers as a numeric vector, implementing the index shown below where \(v\) is vector_of_numbers and \(index\) is the output value of the function. The index tends to -1 when the median is close to the minimum, and it tends to 1 when the median is close to the maximum. Write a piece of code that extracts a colum of your choice from the leicester_2011OAC_IMD2015 dataset as a vector and apply the index.

\[index = \frac{median(v)-min(v)}{max(v)-min(v)} - \frac{max(v)-median(v)}{max(v)-min(v)}\]

Question 6.2.4: If implemented carelessly, the index above can encounter a problem when all values are the same. In that case, \(max(v)-min(v)\) is zero and thus a division by zero might return a NaN value. If you haven’t done so yet, edit the function to take that case into account and simply return the value 0 in that case. Furthermore, include a check to verify that the input is a numeric vector.

3.7 Solutions

3.7.1 Exercise 6.1

Question 6.1.3: Write a function with two parameters, a vector of numbers and a vector of characters (text). The function should check that the input has the correct data type. If all the numbers in the first vector are larger than zero, return the elements of the second vector from the first to the length of the first vector.

3.7.2 Exercise 6.2

A full R Script is available in the Exercise folder of the repository (111_X_Data_Wrangling_with_Functions.R). Upload the prepared script to your Practical_111 project folder, click on the uploaded file to open it in a new editor tab and compare it to your script.