Appendix 1

Basic types

A value in R is an instance of one of three basic types, each encoding a fundamentally different type of information: numeric encoding numbers; logical encoding truth values (also known as Boolean values); and character encoding text. Each type has its characteristics and related operations, as discussed below.

Numeric

The numeric type represents numbers (both integers and reals).

a_number <- 1.41
is.numeric(a_number)
## [1] TRUE
is.integer(a_number)
## [1] FALSE
is.double(a_number) # i.e., is real
## [1] TRUE

Base numeric operators.

Operator Meaning Example Output
+ Plus 5+2 7
- Minus 5-2 3
* Product 5*2 10
/ Division 5/2 2.5
%/% Integer division 5%/%2 2
%% Module 5%%2 1
^ Power 5^2 25

Some pre-defined functions in R:

abs(-2) # Absolute value
## [1] 2
ceiling(3.475) # Upper round
## [1] 4
floor(3.475) # Lower round
## [1] 3
trunc(5.99) # Truncate
## [1] 5
log10(100) # Logarithm 10
## [1] 2
log(exp(2)) # Natural logarithm and e
## [1] 2

Use simple brackets to specify the order of execution. If not specified the default order is: rise to power first, then multiplication and division, sum and subtraction last.

a_number <- 1
(a_number + 2) * 3
## [1] 9
a_number + (2 * 3)
## [1] 7
a_number + 2 * 3
## [1] 7

The object NaN (Not a Number) is returned by R when the result of an operation is not a number.

0 / 0
## [1] NaN
is.nan(0 / 0)
## [1] TRUE

That is not to be confused with the object NA (Not Available), which is returned for missing data.

Logical

The logical type encodes two truth values: True and False.

logical_var <- TRUE
is.logical(logical_var)
## [1] TRUE
isTRUE(logical_var)
## [1] TRUE
as.logical(0) # TRUE if not zero
## [1] FALSE

R provides a series of basic logic operators you can use to evaluate conditions. For instance, you can use the logic operator == to evaluate the condition 5==2, which tests whether the value 5 is equal to the value 2. Conditions can be tested on values as well as on variables.

5==2
## [1] FALSE
first_value <- 5
second_value <- 2
first_value == 5
## [1] TRUE
first_value == 2
## [1] FALSE
second_value == 5
## [1] FALSE
second_value == 2
## [1] TRUE
first_value == second_value
## [1] FALSE
Operator Meaning Example Output
== Equal 5==2 FALSE
!= Not equal 5!=2 TRUE
> Greater than 5>2 TRUE
< Less than 5<2 FALSE
>= Greater or equal 5>=2 TRUE
<= Less or equal 5<=2 FALSE
! Not !TRUE FALSE
& And TRUE & FALSE FALSE
| Or TRUE | FALSE TRUE

Character

The character type represents text objects, including single characters and character strings (that is text objects longer than one character, commonly referred to simply as strings in computer science).

a_string <- "Hello world!"
is.character(a_string)
## [1] TRUE
is.numeric(a_string)
## [1] FALSE
as.character(2) # type conversion  (a.k.a. casting)
## [1] "2"
## [1] 2
as.numeric("Ciao")
## Warning: NAs introduced by coercion
## [1] NA

Types and variables

A variable storing a value of a given type is said to have the same type. However, variables in R don’t have an assigned type themselves. That means that a variable can be assigned a numeric value first and then changed to a character value.

a_variable <- 1.41
a_variable
## [1] 1.41
is.numeric(a_variable)
## [1] TRUE
a_variable <- "Hello world!"
a_variable
## [1] "Hello world!"
is.character(a_variable)
## [1] TRUE
is.numeric(a_variable)
## [1] FALSE

To be more precise, many programming languages require to declare a variable. That is, to state the type of a variable before it can be used. Variable declaration was particularly common in older programming languages such as C and Java. R does not require to declare variables types.

More on vectors and factors

Vectors

The operator : can be used to create integer vectors, starting from the number specified before the operator to the number specified after the operator.

# Create a vector containing integers between 2 and 4
two_to_four <- 2:4
two_to_four
## [1] 2 3 4
# Retrieve cities between the second and the fourth
east_midlands_cities <- c("Derby", "Leicester", "Lincoln", "Nottingham")
east_midlands_cities[two_to_four]
## [1] "Leicester"  "Lincoln"    "Nottingham"
# As the second element of two_to_four is 3...
two_to_four[2]
## [1] 3
# the following command will retrieve the third city
east_midlands_cities[two_to_four[2]]
## [1] "Lincoln"
# Create a vector with cities from the previous vector
selected_cities <- c(east_midlands_cities[1], east_midlands_cities[3:4])

The functions seq and rep can also be used to create vectors, as illustrated below.

seq(1, 10, by = 0.5)
##  [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0
## [16]  8.5  9.0  9.5 10.0
seq(1, 10, length.out = 6)
## [1]  1.0  2.8  4.6  6.4  8.2 10.0
rep("Ciao", 4)
## [1] "Ciao" "Ciao" "Ciao" "Ciao"

The logical operators any and all can be used to test conditions on the vector. The former returns TRUE if at least one element satisfies the statement and the second returns TRUE if all elements satisfy the condition

any(east_midlands_cities == "Leicester")
## [1] TRUE
my_sequence <- seq(1, 10, length.out = 7)
my_sequence
## [1]  1.0  2.5  4.0  5.5  7.0  8.5 10.0
any(my_sequence > 5)
## [1] TRUE
all(my_sequence > 5)
## [1] FALSE

Factors

A factor is a data type similar to a vector. However, the values contained in a factor can only be selected from a set of levels.

houses_vector <- c("Bungalow", "Flat", "Flat",
  "Detached", "Flat", "Terrace", "Terrace")
houses_vector
## [1] "Bungalow" "Flat"     "Flat"     "Detached" "Flat"     "Terrace"  "Terrace"
houses_factor <- factor(c("Bungalow", "Flat", "Flat",
  "Detached", "Flat", "Terrace", "Terrace"))
houses_factor
## [1] Bungalow Flat     Flat     Detached Flat     Terrace  Terrace 
## Levels: Bungalow Detached Flat Terrace

The function table can be used to obtain a tabulated count for each level.

houses_factor <- factor(c("Bungalow", "Flat", "Flat",
  "Detached", "Flat", "Terrace", "Terrace"))
houses_factor
## [1] Bungalow Flat     Flat     Detached Flat     Terrace  Terrace 
## Levels: Bungalow Detached Flat Terrace
table(houses_factor)
## houses_factor
## Bungalow Detached     Flat  Terrace 
##        1        1        3        2

A specific set of levels can be specified when creating a factor by providing a levels argument.

houses_factor_spec <- factor(
  c("People Carrier", "Flat", "Flat", "Hatchback",
      "Flat", "Terrace", "Terrace"),
  levels = c("Bungalow", "Flat", "Detached",
       "Semi", "Terrace"))

table(houses_factor_spec)
## houses_factor_spec
## Bungalow     Flat Detached     Semi  Terrace 
##        0        3        0        0        2

In statistics terminology, (unordered) factors are categorical (i.e., binary or nominal) variables. Levels are not ordered.

income_nominal <- factor(
  c("High", "High", "Low", "Low", "Low",
      "Medium", "Low", "Medium"),
  levels = c("Low", "Medium", "High"))

The greater than operator is not meaningful on the income_nominal factor defined above.

income_nominal > "Low"
## Warning in Ops.factor(income_nominal, "Low"): '>' not meaningful for factors
## [1] NA NA NA NA NA NA NA NA

In statistics terminology, ordered factors are ordinal variables. Levels are ordered.

income_ordered <- ordered(
  c("High", "High", "Low", "Low", "Low",
      "Medium", "Low", "Medium"),
  levels = c("Low", "Medium", "High"))

income_ordered > "Low"
## [1]  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE
sort(income_ordered)
## [1] Low    Low    Low    Low    Medium Medium High   High  
## Levels: Low < Medium < High

by Stefano De Sabbata – text licensed under the CC BY-SA 4.0, contains public sector information licensed under the Open Government Licence v3.0, code licensed under the GNU GPL v3.0.