2020-01-15
Moving from programming to data science
How to control the data processing flow
Format: if (condition) statement
TRUE or FALSE)TRUEa_value <- -7
if (a_value < 0) cat("Negative")
## Negative
a_value <- 8
if (a_value < 0) cat("Negative")
Format: if (condition) statement1 else statement2
TRUE or FALSE)TRUEFALSEa_value <- -7
if (a_value < 0) cat("Negative") else cat("Positive")
## Negative
a_value <- 8
if (a_value < 0) cat("Negative") else cat("Positive")
## Positive
Suppose you want to execute several statements within a function, or if a condition is true
{ and } contain code blocksfirst_value <- 8
second_value <- 5
if (first_value > second_value) {
cat("First is greater than second\n")
difference <- first_value - second_value
cat("Their difference is ", difference)
}
## First is greater than second ## Their difference is 3
Loops are a fundamental component of (procedural) programming.
There are two main types of loops:
whilerepeatforThe while construct can be defined using the while reserved word, followed by the conditional statement between simple brackets, and a code block. The instructions in the code block are re-executed as long as the result of the evaluation of the conditional statement is TRUE.
current_value <- 0
while (current_value < 3) {
cat("Current value is", current_value, "\n")
current_value <- current_value + 1
}
## Current value is 0 ## Current value is 1 ## Current value is 2
The for construct can be defined using the for reserved word, followed by the definition of an iterator. The iterator is a variable which is temporarily assigned with the current element of a vector, as the construct iterates through all elements of the list. This definition is followed by a code block, whose instructions are re-executed once for each element of the vector.
cities <- c("Derby", "Leicester", "Lincoln", "Nottingham")
for (city in cities) {
cat("Do you live in", city, "?\n")
}
## Do you live in Derby ? ## Do you live in Leicester ? ## Do you live in Lincoln ? ## Do you live in Nottingham ?
It is common practice to create a vector of integers on the spot in order to execute a certain sequence of steps a pre-defined number of times.
for (i in 1:3) {
cat("This is exectuion number", i, ":\n")
cat(" See you later!\n")
}
## This is exectuion number 1 : ## See you later! ## This is exectuion number 2 : ## See you later! ## This is exectuion number 3 : ## See you later!
3:0
## [1] 3 2 1 0
#Example: countdown!
for (i in 3:0) {
if (i == 0) {
cat("Go!\n")
} else {
cat(i, "\n")
}
}
## 3 ## 2 ## 1 ## Go!
A function can be defined
add_one)<-add_one <- function (input_value) {
output_value <- input_value + 1
output_value
}
The corpus
functioninput_value) between simple bracketsadd_one <- function (input_value) {
output_value <- input_value + 1
output_value
}
After being defined, a function can be invoked by specifying the identifier
add_one (3)
## [1] 4
area_rectangle <- function (hight, width) {
area <- hight * width
area
}
area_rectangle(3, 2)
## [1] 6
Functions can contain both loops and conditional statements in their corpus
factorial <- function (input_value) {
result <- 1
for (i in 1:input_value) {
cat("current:", result, " | i:", i, "\n")
result <- result * i
}
result
}
factorial(3)
## current: 1 | i: 1 ## current: 1 | i: 2 ## current: 2 | i: 3
## [1] 6
The scope of a variable is the part of code in which the variable is ``visible’’
In R, variables have a hierarchical scope:
if or loop constructsIn the case below
x_value is global to the function times_xnew_value and input_value are local to the function times_x
new_value or input_value from outside the definition of times_x would result in an errorx_value <- 10
times_x <- function (input_value) {
new_value <- input_value * x_value
new_value
}
times_x(2)
## [1] 20
How to control the data processing flow
In the practical session, we will see