Appendix 1
Basic types
A value in R
is an instance of one of three basic types, each encoding a fundamentally different type of information: numeric
encoding numbers; logical
encoding truth values (also known as Boolean values); and character encoding text. Each type has its characteristics and related operations, as discussed below.
Numeric
The numeric type represents numbers (both integers and reals).
a_number <- 1.41
is.numeric(a_number)
## [1] TRUE
is.integer(a_number)
## [1] FALSE
is.double(a_number) # i.e., is real
## [1] TRUE
Base numeric operators.
Operator | Meaning | Example | Output |
---|---|---|---|
+ | Plus | 5+2 |
7 |
- | Minus | 5-2 |
3 |
* |
Product | 5*2 |
10 |
/ | Division | 5/2 |
2.5 |
%/% | Integer division | 5%/%2 |
2 |
%% | Module | 5%%2 |
1 |
^ | Power | 5^2 |
25 |
Some pre-defined functions in R
:
abs(-2) # Absolute value
## [1] 2
ceiling(3.475) # Upper round
## [1] 4
floor(3.475) # Lower round
## [1] 3
trunc(5.99) # Truncate
## [1] 5
log10(100) # Logarithm 10
## [1] 2
## [1] 2
Use simple brackets to specify the order of execution. If not specified the default order is: rise to power first, then multiplication and division, sum and subtraction last.
a_number <- 1
(a_number + 2) * 3
## [1] 9
a_number + (2 * 3)
## [1] 7
a_number + 2 * 3
## [1] 7
The object NaN
(Not a Number) is returned by R
when the result of an operation is not a number.
0 / 0
## [1] NaN
is.nan(0 / 0)
## [1] TRUE
That is not to be confused with the object NA
(Not Available), which is returned for missing data.
Logical
The logical type encodes two truth values: True and False.
logical_var <- TRUE
is.logical(logical_var)
## [1] TRUE
isTRUE(logical_var)
## [1] TRUE
as.logical(0) # TRUE if not zero
## [1] FALSE
R
provides a series of basic logic operators you can use to evaluate conditions. For instance, you can use the logic operator ==
to evaluate the condition 5==2
, which tests whether the value 5
is equal to the value 2
. Conditions can be tested on values as well as on variables.
5==2
## [1] FALSE
first_value <- 5
second_value <- 2
first_value == 5
## [1] TRUE
first_value == 2
## [1] FALSE
second_value == 5
## [1] FALSE
second_value == 2
## [1] TRUE
first_value == second_value
## [1] FALSE
Operator | Meaning | Example | Output |
---|---|---|---|
== | Equal | 5==2 |
FALSE |
!= | Not equal | 5!=2 |
TRUE |
> | Greater than | 5>2 |
TRUE |
< | Less than | 5<2 |
FALSE |
>= | Greater or equal | 5>=2 |
TRUE |
<= | Less or equal | 5<=2 |
FALSE |
! | Not | !TRUE |
FALSE |
& | And | TRUE & FALSE |
FALSE |
| | Or | TRUE | FALSE |
TRUE |
Character
The character type represents text objects, including single characters and character strings (that is text objects longer than one character, commonly referred to simply as strings in computer science).
a_string <- "Hello world!"
is.character(a_string)
## [1] TRUE
is.numeric(a_string)
## [1] FALSE
as.character(2) # type conversion (a.k.a. casting)
## [1] "2"
as.numeric("2")
## [1] 2
as.numeric("Ciao")
## Warning: NAs introduced by coercion
## [1] NA
Types and variables
A variable storing a value of a given type is said to have the same type. However, variables in R
don’t have an assigned type themselves. That means that a variable can be assigned a numeric value first and then changed to a character value.
a_variable <- 1.41
a_variable
## [1] 1.41
is.numeric(a_variable)
## [1] TRUE
a_variable <- "Hello world!"
a_variable
## [1] "Hello world!"
is.character(a_variable)
## [1] TRUE
is.numeric(a_variable)
## [1] FALSE
To be more precise, many programming languages require to declare a variable. That is, to state the type of a variable before it can be used. Variable declaration was particularly common in older programming languages such as C
and Java
. R
does not require to declare variables types.
More on vectors and factors
Vectors
The operator :
can be used to create integer vectors, starting from the number specified before the operator to the number specified after the operator.
# Create a vector containing integers between 2 and 4
two_to_four <- 2:4
two_to_four
## [1] 2 3 4
# Retrieve cities between the second and the fourth
east_midlands_cities <- c("Derby", "Leicester", "Lincoln", "Nottingham")
east_midlands_cities[two_to_four]
## [1] "Leicester" "Lincoln" "Nottingham"
# As the second element of two_to_four is 3...
two_to_four[2]
## [1] 3
# the following command will retrieve the third city
east_midlands_cities[two_to_four[2]]
## [1] "Lincoln"
# Create a vector with cities from the previous vector
selected_cities <- c(east_midlands_cities[1], east_midlands_cities[3:4])
The functions seq
and rep
can also be used to create vectors, as illustrated below.
seq(1, 10, by = 0.5)
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
## [16] 8.5 9.0 9.5 10.0
seq(1, 10, length.out = 6)
## [1] 1.0 2.8 4.6 6.4 8.2 10.0
rep("Ciao", 4)
## [1] "Ciao" "Ciao" "Ciao" "Ciao"
The logical operators any
and all
can be used to test conditions on the vector. The former returns TRUE
if at least one element satisfies the statement and the second returns TRUE
if all elements satisfy the condition
any(east_midlands_cities == "Leicester")
## [1] TRUE
my_sequence <- seq(1, 10, length.out = 7)
my_sequence
## [1] 1.0 2.5 4.0 5.5 7.0 8.5 10.0
any(my_sequence > 5)
## [1] TRUE
all(my_sequence > 5)
## [1] FALSE
Factors
A factor is a data type similar to a vector. However, the values contained in a factor can only be selected from a set of levels.
houses_vector <- c("Bungalow", "Flat", "Flat",
"Detached", "Flat", "Terrace", "Terrace")
houses_vector
## [1] "Bungalow" "Flat" "Flat" "Detached" "Flat" "Terrace" "Terrace"
houses_factor <- factor(c("Bungalow", "Flat", "Flat",
"Detached", "Flat", "Terrace", "Terrace"))
houses_factor
## [1] Bungalow Flat Flat Detached Flat Terrace Terrace
## Levels: Bungalow Detached Flat Terrace
The function table can be used to obtain a tabulated count for each level.
houses_factor <- factor(c("Bungalow", "Flat", "Flat",
"Detached", "Flat", "Terrace", "Terrace"))
houses_factor
## [1] Bungalow Flat Flat Detached Flat Terrace Terrace
## Levels: Bungalow Detached Flat Terrace
table(houses_factor)
## houses_factor
## Bungalow Detached Flat Terrace
## 1 1 3 2
A specific set of levels can be specified when creating a factor by providing a levels argument.
houses_factor_spec <- factor(
c("People Carrier", "Flat", "Flat", "Hatchback",
"Flat", "Terrace", "Terrace"),
levels = c("Bungalow", "Flat", "Detached",
"Semi", "Terrace"))
table(houses_factor_spec)
## houses_factor_spec
## Bungalow Flat Detached Semi Terrace
## 0 3 0 0 2
In statistics terminology, (unordered) factors are categorical (i.e., binary or nominal) variables. Levels are not ordered.
income_nominal <- factor(
c("High", "High", "Low", "Low", "Low",
"Medium", "Low", "Medium"),
levels = c("Low", "Medium", "High"))
The greater than operator is not meaningful on the income_nominal
factor defined above.
income_nominal > "Low"
## Warning in Ops.factor(income_nominal, "Low"): '>' not meaningful for factors
## [1] NA NA NA NA NA NA NA NA
In statistics terminology, ordered factors are ordinal variables. Levels are ordered.
income_ordered <- ordered(
c("High", "High", "Low", "Low", "Low",
"Medium", "Low", "Medium"),
levels = c("Low", "Medium", "High"))
income_ordered > "Low"
## [1] TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE
sort(income_ordered)
## [1] Low Low Low Low Medium Medium High High
## Levels: Low < Medium < High
by Stefano De Sabbata – text licensed under the CC BY-SA 4.0, contains public sector information licensed under the Open Government Licence v3.0, code licensed under the GNU GPL v3.0.