2020-01-15
Moving from programming to data science
How to control the data processing flow
Format: if (condition) statement
TRUE
or FALSE
)TRUE
a_value <- -7 if (a_value < 0) cat("Negative")
## Negative
a_value <- 8 if (a_value < 0) cat("Negative")
Format: if (condition) statement1 else statement2
TRUE
or FALSE
)TRUE
FALSE
a_value <- -7 if (a_value < 0) cat("Negative") else cat("Positive")
## Negative
a_value <- 8 if (a_value < 0) cat("Negative") else cat("Positive")
## Positive
Suppose you want to execute several statements within a function, or if a condition is true
{
and }
contain code blocksfirst_value <- 8 second_value <- 5 if (first_value > second_value) { cat("First is greater than second\n") difference <- first_value - second_value cat("Their difference is ", difference) }
## First is greater than second ## Their difference is 3
Loops are a fundamental component of (procedural) programming.
There are two main types of loops:
while
repeat
for
The while construct can be defined using the while
reserved word, followed by the conditional statement between simple brackets, and a code block. The instructions in the code block are re-executed as long as the result of the evaluation of the conditional statement is TRUE
.
current_value <- 0 while (current_value < 3) { cat("Current value is", current_value, "\n") current_value <- current_value + 1 }
## Current value is 0 ## Current value is 1 ## Current value is 2
The for construct can be defined using the for
reserved word, followed by the definition of an iterator. The iterator is a variable which is temporarily assigned with the current element of a vector, as the construct iterates through all elements of the list. This definition is followed by a code block, whose instructions are re-executed once for each element of the vector.
cities <- c("Derby", "Leicester", "Lincoln", "Nottingham") for (city in cities) { cat("Do you live in", city, "?\n") }
## Do you live in Derby ? ## Do you live in Leicester ? ## Do you live in Lincoln ? ## Do you live in Nottingham ?
It is common practice to create a vector of integers on the spot in order to execute a certain sequence of steps a pre-defined number of times.
for (i in 1:3) { cat("This is exectuion number", i, ":\n") cat(" See you later!\n") }
## This is exectuion number 1 : ## See you later! ## This is exectuion number 2 : ## See you later! ## This is exectuion number 3 : ## See you later!
3:0
## [1] 3 2 1 0
#Example: countdown! for (i in 3:0) { if (i == 0) { cat("Go!\n") } else { cat(i, "\n") } }
## 3 ## 2 ## 1 ## Go!
A function can be defined
add_one
)<-
add_one <- function (input_value) { output_value <- input_value + 1 output_value }
The corpus
function
input_value
) between simple bracketsadd_one <- function (input_value) { output_value <- input_value + 1 output_value }
After being defined, a function can be invoked by specifying the identifier
add_one (3)
## [1] 4
area_rectangle <- function (hight, width) { area <- hight * width area } area_rectangle(3, 2)
## [1] 6
Functions can contain both loops and conditional statements in their corpus
factorial <- function (input_value) { result <- 1 for (i in 1:input_value) { cat("current:", result, " | i:", i, "\n") result <- result * i } result } factorial(3)
## current: 1 | i: 1 ## current: 1 | i: 2 ## current: 2 | i: 3
## [1] 6
The scope of a variable is the part of code in which the variable is ``visible’’
In R, variables have a hierarchical scope:
if
or loop constructsIn the case below
x_value
is global to the function times_x
new_value
and input_value
are local to the function times_x
new_value
or input_value
from outside the definition of times_x
would result in an errorx_value <- 10 times_x <- function (input_value) { new_value <- input_value * x_value new_value } times_x(2)
## [1] 20
How to control the data processing flow
In the practical session, we will see