Data Structure

Overview

R has 6 basic data types.

  • character: "aquatic", "ecology" (no order)
  • factor: similar to character, but has levels (alphabetically ordered by default)
  • numeric: 20.0 , 15.5
  • integer: 3, 7
  • logical: TRUE , FALSE
  • complex: 1+2i (complex numbers with real and imaginary parts)

These elements form one of the following data structures.

  • vector: a series of elements. A single data type is allowed in a single vector
  • matrix: elements organized into rows and columns. A single data type is allowed in a single matrix
  • data frame: looks similar to a matrix, but allows different data types in different columns

Vector

Create Vector

Below are examples of atomic character vectors, numeric vectors, integer vectors, etc. There are many ways to create vector data. The following examples use c(), :, seq(), rep():

#ex.1a manually create a vector using c()
x <- c(1,3,4,8)
x
## [1] 1 3 4 8
#ex.1b character
x <- c("a", "b", "c")
x
## [1] "a" "b" "c"
#ex.1c logical
x <- c(TRUE, FALSE, FALSE)
x
## [1]  TRUE FALSE FALSE
#ex.2 sequence of numbers
x <- 1:5
x
## [1] 1 2 3 4 5
#ex.3a replicate same numbers or characters
x <- rep(2, 5) # replicate 2 five times
x
## [1] 2 2 2 2 2
#ex.3b replicate same numbers or characters
x <- rep("a", 5) # replicate "a" five times
x
## [1] "a" "a" "a" "a" "a"
#ex.4a use seq() function
x <- seq(1, 5, by = 1)
x
## [1] 1 2 3 4 5
#ex.4b use seq() function
x <- seq(1, 5, by = 0.1)
x
##  [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
## [20] 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7
## [39] 4.8 4.9 5.0

Check Features

R provides many functions to examine features of vectors and other objects, for example:

  • class() - return high-level data structure of the object
  • typeof() - return low-level data structure of the object
  • attributes() - metadata of the object
  • length() - number of elements in the object
  • sum() - sum of object’s elements
  • mean() - mean of object’s elements

Numeric Vector

x <- c(1.2, 3.1, 4.0, 8.2)
x
## [1] 1.2 3.1 4.0 8.2
## [1] "numeric"
## [1] "double"
## [1] 4
sum(x)
## [1] 16.5
mean(x)
## [1] 4.125

Character Vector

y <- c("a", "b", "c")
class(y)
## [1] "character"
## [1] 3

Access

Element ID
Use brackets [] when accessing specific elements in an object. For example, if you want to access element #2 in the vector x, you may specify as x[2]:

x <- c(2,2,3,2,5)
x[2] # access element #2
## [1] 2
x[c(2,4)] # access elements #2 and 4
## [1] 2 2
x[2:4] # access elements #2-4
## [1] 2 3 2

Equation
R provides many ways to access elements that suffice specific conditions. You can use mathematical symbols to specify what you need, for example:

  • == equal
  • > larger than
  • >= equal & larger than
  • < smaller than
  • <= equal & smaller than
  • which() a function that returns element # that suffices the specified condition

The following examples return a logical vector indicating whether each element in x suffices the specified condition:

# creating a vector
x <- c(2,2,3,2,5)

# ex.1a equal
x == 2
## [1]  TRUE  TRUE FALSE  TRUE FALSE
# ex.1b larger than
x > 2 
## [1] FALSE FALSE  TRUE FALSE  TRUE

You can access elements that suffice the specified condition using brackets, for example:

# ex.2a equal
x[x == 2]
## [1] 2 2 2
# ex.2b larger than
x[x > 2]
## [1] 3 5

Using which(), you can see which elements (i.e., #) matches what you need:

# ex.3a equal
which(x == 2) # returns which elements are equal to 2
## [1] 1 2 4
# ex.3b larger than
which(x > 2)
## [1] 3 5

Matrix

Create Matrix

Matrix is a set of elements (single data type) that are organized into rows and columns:

#ex.1 cbind: combine objects by column
x <- cbind(c(1,2,3), c(4,5,6))
x
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
#ex.2 rbind: combine objects by row
x <- rbind(c(1,2,3), c(4,5,6))
x
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
#ex.3 matrix: specify elements and the number of rows (nrow) and columns (ncol)
x <- matrix(1:9, nrow = 3, ncol = 3)
x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Check Features

R provides many functions to examine features of matrix data, for example:

Integer Matrix

x <- matrix(1:9, nrow = 3, ncol = 3)
x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
## [1] "matrix" "array"
## [1] "integer"
dim(x)
## [1] 3 3

Character Matrix

y <- matrix(c("a","b", "c", "d", "e", "f"), nrow = 3, ncol = 2)
y
##      [,1] [,2]
## [1,] "a"  "d" 
## [2,] "b"  "e" 
## [3,] "c"  "f"
## [1] "matrix" "array"
## [1] "character"
dim(y)
## [1] 3 2

Access

When accessing matrix elements, you need to pick row(s) and/or column(s), for example:

x <- matrix(1:9, nrow = 3, ncol = 3)
x
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9
x[2,3] # access an element in row #2 and colum #3
## [1] 8
x[2,] # access elements in row #2
## [1] 2 5 8
x[c(2,3),] # access elements in rows #2 and 3
##      [,1] [,2] [,3]
## [1,]    2    5    8
## [2,]    3    6    9
x[,c(2,3)] # access elements in columns #2 and 3
##      [,1] [,2]
## [1,]    4    7
## [2,]    5    8
## [3,]    6    9

You can assess each element with mathematical expressions just like vectors:

x == 2 # equal
##       [,1]  [,2]  [,3]
## [1,] FALSE FALSE FALSE
## [2,]  TRUE FALSE FALSE
## [3,] FALSE FALSE FALSE
x > 2 # larger than
##       [,1] [,2] [,3]
## [1,] FALSE TRUE TRUE
## [2,] FALSE TRUE TRUE
## [3,]  TRUE TRUE TRUE

However, care must be taken when accessing elements, as it will be automatically converted to vector data:

x[x == 2] # equal
## [1] 2
x[x > 2] # larger than
## [1] 3 4 5 6 7 8 9

which() needs an additional argument to return both row and column #:

which(x == 2, arr.ind = TRUE)
##      row col
## [1,]   2   1
which(x > 2, arr.ind = TRUE)
##      row col
## [1,]   3   1
## [2,]   1   2
## [3,]   2   2
## [4,]   3   2
## [5,]   1   3
## [6,]   2   3
## [7,]   3   3

Data Frame

A data frame is a collection of elements organized into rows and columns, but it differs from a matrix in several ways.

  • It allows for the inclusion of multiple data types in different columns.
  • Each column in a data frame has a name associated with it.
  • You can access columns in a data frame by their respective names using the $ operator.

The data frame is the most commonly used data structure when manipulating ecological data. When loading a dataset from a spreadsheet (which we will discuss later), it is automatically recognized as a data frame. Let’s consider an example:

Creating a data frame
In the following example, the variables x and y are organized into a single data frame named df0. The variables are renamed as part of the process of creating the data frame.

# Create data frame
x <- c("Pristine", "Pristine", "Disturbed", "Disturbed", "Pristine") # Lake type
y <- c(1.2, 2.2, 10.9, 50.0, 3.0) # TSS: total suspended solids (mg/L)
df0 <- data.frame(LakeType = x, TSS = y) # x is named as "LakeType" while y is named as "TSS"
df0
##    LakeType  TSS
## 1  Pristine  1.2
## 2  Pristine  2.2
## 3 Disturbed 10.9
## 4 Disturbed 50.0
## 5  Pristine  3.0

Call column names

colnames(df0) # call column names
## [1] "LakeType" "TSS"

Access by columns

df0$LakeType # access LakeType
## [1] "Pristine"  "Pristine"  "Disturbed" "Disturbed" "Pristine"
df0$TSS # access TSS
## [1]  1.2  2.2 10.9 50.0  3.0

You can access elements like a matrix as well:

df0[,1] # access column #1
## [1] "Pristine"  "Pristine"  "Disturbed" "Disturbed" "Pristine"
df0[1,] # access row #1
##   LakeType TSS
## 1 Pristine 1.2
df0[c(2,4),] # access row #2 and 4
##    LakeType  TSS
## 2  Pristine  2.2
## 4 Disturbed 50.0

Exercise

Vector

  1. Create three numeric vectors with length 3, 6 and 20, respectively. Vectors must be created using at least two different functions in R.
  2. Create three character vectors with length 3, 6 and 20, respectively. Vectors must be created using at least two different functions in R.
  3. Copy the following script to your R script and perform the following analysis:
  • Identify element IDs of x that are greater than 2.0
  • Identify element values of x that are greater than 2.0
set.seed(1)
x <- rnorm(100)

Matrix

  1. Create a numeric matrix with 4 rows and 4 columns. Each column must contain identical elements.
  2. Create a numeric matrix with 4 rows and 4 columns. Each row must contain identical elements.
  3. Create a character matrix with 4 rows and 4 columns. Each column must contain identical elements.
  4. Create a character matrix with 4 rows and 4 columns. Each row must contain identical elements.
  5. Copy the following script to your R script and perform the following analysis:
  • Identify element IDs of x that are greater than 2.0 (specify row and column IDs)
  • Identify element values of x that are greater than 2.0 and calculate the mean.
set.seed(1)
x <- matrix(rnorm(100), nrow = 10, ncol = 10)

Data Frame

  1. Create a data frame of 3 variables with 10 elements (name variables as x, y and z. x must be character while y and z must be numeric.
  2. Check the data structure (higher-level) of x, y and z
  3. Copy the following script to your R script and perform the following analysis:
  • Calculate the means of temperature and abundance for states VA and NC separately.
set.seed(1)
x <- rnorm(100, mean = 10, sd = 3)
y <- rpois(100, lambda = 10)
z <- rep(c("VA", "NC"), 50)
df0 <- data.frame(temperature = x, abundance = y, state = z)