2020-01-15

Recap @ 111

Previous lectures

Moving from programming to data science

  • Basic types and variables
  • The pipe operator
  • Coding style
  • Complex data types
    • Vectors
    • Data Frames
  • Data selection and filtering
  • Join operations
  • Table re-shaping

This lecture

How to control the data processing flow

  • Conditional statements
  • Loops
    • While
    • For
  • Functions
  • Scope of a variable

Conditional statements

If

Format: if (condition) statement

  • condition: expression returning a logic value (TRUE or FALSE)
  • statement: any valid R statement
  • statement only executed if condition is TRUE
a_value <- -7
if (a_value < 0) cat("Negative")
## Negative
a_value <- 8
if (a_value < 0) cat("Negative")

Else

Format: if (condition) statement1 else statement2

  • condition: expression returning a logic value (TRUE or FALSE)
  • statement1 and statement2: any valid R statements
  • statement1 executed if condition is TRUE
  • statement2 executed if condition is FALSE
a_value <- -7
if (a_value < 0) cat("Negative") else cat("Positive")
## Negative
a_value <- 8
if (a_value < 0) cat("Negative") else cat("Positive")
## Positive

Code blocks

Suppose you want to execute several statements within a function, or if a condition is true

  • Such a group of statements are called code blocks
  • { and } contain code blocks
first_value <- 8
second_value <- 5
if (first_value > second_value) {
  cat("First is greater than second\n") 
  difference <- first_value - second_value
  cat("Their difference is ", difference)
}
## First is greater than second
## Their difference is  3

Loops

Loops

Loops are a fundamental component of (procedural) programming.

There are two main types of loops:

  • conditional loops are executed as long as a defined condition holds true
    • construct while
    • construct repeat
  • deterministic loops are executed a pre-determined number of times
    • construct for

While

The while construct can be defined using the while reserved word, followed by the conditional statement between simple brackets, and a code block. The instructions in the code block are re-executed as long as the result of the evaluation of the conditional statement is TRUE.

current_value <- 0

while (current_value < 3) {
  cat("Current value is", current_value, "\n")
  current_value <- current_value + 1
}
## Current value is 0 
## Current value is 1 
## Current value is 2

For

The for construct can be defined using the for reserved word, followed by the definition of an iterator. The iterator is a variable which is temporarily assigned with the current element of a vector, as the construct iterates through all elements of the list. This definition is followed by a code block, whose instructions are re-executed once for each element of the vector.

cities <- c("Derby", "Leicester", "Lincoln", "Nottingham")
for (city in cities) {
  cat("Do you live in", city, "?\n")
}
## Do you live in Derby ?
## Do you live in Leicester ?
## Do you live in Lincoln ?
## Do you live in Nottingham ?

For

It is common practice to create a vector of integers on the spot in order to execute a certain sequence of steps a pre-defined number of times.

for (i in 1:3) {
  cat("This is exectuion number", i, ":\n")
  cat("    See you later!\n")
}
## This is exectuion number 1 :
##     See you later!
## This is exectuion number 2 :
##     See you later!
## This is exectuion number 3 :
##     See you later!

Loops with conditional statements

3:0
## [1] 3 2 1 0
#Example: countdown!
for (i in 3:0) {
  if (i == 0) {
    cat("Go!\n")
  } else {
    cat(i, "\n")
  }
}
## 3 
## 2 
## 1 
## Go!

Functions

Defining functions

A function can be defined

  • using an identifier (e.g., add_one)
  • on the left of an assignment operator <-
  • followed by the corpus of the function
add_one <- function (input_value) {
  output_value <- input_value + 1
  output_value
  }

Defining functions

The corpus

  • starts with the reserved word function
  • followed by the parameter(s) (e.g., input_value) between simple brackets
  • and the instruction(s) to be executed in a code block
  • the value of the last statement is returned as output
add_one <- function (input_value) {
  output_value <- input_value + 1
  output_value
  }

Defining functions

After being defined, a function can be invoked by specifying the identifier

add_one (3)
## [1] 4

More parameters

  • a function can be defined as having two or more parameters by specifying more than one parameter name (separated by commas) in the function definition
  • a function always take as input as many values as the number of parameters specified in the definition
    • otherwise an error is generated
area_rectangle <- function (hight, width) {
  area <- hight * width
  area
}

area_rectangle(3, 2)
## [1] 6

Functions and control structures

Functions can contain both loops and conditional statements in their corpus

factorial <- function (input_value) {
  result <- 1
  for (i in 1:input_value) {
    cat("current:", result, " | i:", i, "\n")
    result <- result * i
  }
  result
}
factorial(3)
## current: 1  | i: 1 
## current: 1  | i: 2 
## current: 2  | i: 3
## [1] 6

Scope

The scope of a variable is the part of code in which the variable is ``visible’’

In R, variables have a hierarchical scope:

  • a variable defined in a script can be used referred to from within a definition of a function in the same scrip
  • a variable defined within a definition of a function will not be referable from outside the definition
  • scope does not apply to if or loop constructs

Example

In the case below

  • x_value is global to the function times_x
  • new_value and input_value are local to the function times_x
    • referring to new_value or input_value from outside the definition of times_x would result in an error
x_value <- 10
times_x <- function (input_value) {
  new_value <- input_value * x_value
  new_value
}
times_x(2)
## [1] 20

Summary

Summary

How to control the data processing flow

  • Conditional statements
  • Loops
    • While
    • For
  • Functions
  • Scope of a variable
  • Debugging

Practical session

In the practical session, we will see

  • Conditional statements
  • Loops
    • While
    • For
  • Functions
    • Loading functions from scripts
  • Debugging

Next lecture

  • Data visualisation
    • histograms
    • boxplots
    • scatterplots
  • Descriptive statistics
  • Exploring assumptions
    • Shapiro–Wilk test
    • skewness and kurtosis
    • Levene’s test