5 Vector-Raster Interactions

5.1 Learning objectives

  1. Understand the fundamentals of vector–raster interaction and analysis

  2. Learn how to extract and append raster-derived attributes to vector data

  3. Visualize the results of vector–raster analysis using effective mapping techniques

In this chapter, we will work with the following packages. Before starting any exercises, make sure to run the following code.

if (!require(pacman)) install.packages("pacman")

pacman::p_load(tidyverse,
               sf,
               terra,
               tidyterra,
               exactextractr)

5.2 Interations between vector and raster data

In ecological data analysis, it is common to relate ecological responses (e.g., species richness) to environmental variables. To do this effectively, we often need to extract values from raster layers at specific locations, such as survey sites (points) or areas (polygons). This step allows us to link ecological observations with spatial environmental data for further analysis. In this exercise, we will use fish survey sites and county polygons to extract environmental variables from raster data.

To get started, read the following datasets into R to perform the analysis:

## finsync survey site
sf_site <- readRDS("data/sf_finsync_nc.rds")

## county polygons
sf_nc_county <- readRDS("data/sf_nc_county.rds")

## precipitation raster
spr_prec_nc <- rast("data/spr_prec_nc.tif")

5.3 Point-wise extraction

Let’s perform point-wise extraction of precipitation values at survey site locations. To explore the spatial patterns and get a sense of the data, we will first visualize the raster precipitation data using tidyterra::geom_spatraster() and overlay the survey sites using ggplot2::geom_sf().

ggplot() +
  geom_spatraster(data = spr_prec_nc) +
  geom_sf(data = sf_site) +
  scale_fill_viridis_c() + # change color palette for raster
  theme_bw()

To extract raster values at the survey sites, we will use the terra::extract() function. This function retrieves the values of one or more raster layers at the locations specified by spatial vector data, such as points or polygons. In our case, it will return the precipitation value at each survey site location.

Since the output of terra::extract() with bind = TRUE is a class of SpatVector that includes geometry but is not yet an sf object, we convert it back to an sf object using sf::st_as_sf() (using %>%, these steps can be done in one step).

# # The following code works exactly as:
# spr_site_prec <- extract(x = spr_prec_nc,
#                          y = sf_site,
#                          bind = TRUE)
# 
# sf_site_prec <- st_as_sf(spr_site_prec)

(sf_site_prec <- extract(x = spr_prec_nc,
                         y = sf_site,
                         bind = TRUE) %>% 
   st_as_sf())
## Simple feature collection with 122 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -84.02452 ymin: 34.53497 xmax: -76.74611 ymax: 36.54083
## Geodetic CRS:  WGS 84
## First 10 features:
##                 site_id precipitation                   geometry
## 1  finsync_nrs_nc-10013        1349.4 POINT (-81.51025 36.11188)
## 2  finsync_nrs_nc-10014        1121.5 POINT (-80.35989 35.87616)
## 3  finsync_nrs_nc-10020        1297.3 POINT (-81.74472 35.64379)
## 4  finsync_nrs_nc-10023        1162.0  POINT (-82.77898 35.6822)
## 5  finsync_nrs_nc-10024        1302.0 POINT (-77.75384 35.38553)
## 6  finsync_nrs_nc-10027        1741.5 POINT (-83.69678 35.02467)
## 7  finsync_nrs_nc-10029        1224.3 POINT (-80.65668 35.16119)
## 8  finsync_nrs_nc-10034        1813.2 POINT (-82.04497 36.09917)
## 9  finsync_nrs_nc-10041        1141.3 POINT (-80.46558 35.04365)
## 10 finsync_nrs_nc-10049        1444.7 POINT (-77.96322 34.58249)

This allows for seamless spatial operations and visualization downstream. Since our point data now include precipitation values, we can represent the survey sites with colors that reflect the precipitation gradient.

ggplot() +
  geom_sf(data = sf_nc_county,       # Plot county boundaries as a grey background
          fill = "grey") + 
  geom_sf(data = sf_site_prec,       # Plot survey points colored by precipitation
          aes(color = precipitation)) +
  scale_color_viridis_c() +          # Apply a perceptually uniform color scale
  theme_bw()                        # Use a clean black-and-white theme

5.4 Zonal statistics

Zonal statistics in GIS refers to the process of summarizing raster values within the boundaries of defined zones, typically represented by polygons. For example, you might calculate the average precipitation within each watershed or administrative boundary. This is commonly done by overlaying polygon features onto a raster layer and computing summary statistics (e.g., mean, sum, min, max) for all raster cells that fall within each polygon.

5.4.1 Polygon-based analysis

Traditional approach

The exactextractr package provides a suite of efficient and precise tools for performing zonal statistics, particularly when working with raster and polygon data. In the following example, we will use exact_extract() to calculate the mean precipitation within each county polygon.

However, before performing zonal statistics, it is critical to ensure that both the raster and vector layers use a projected CRS – not a geographic one (like WGS 84 in degrees). This is because zonal statistics summarize values over areas, and accurate area-based calculations require linear units (e.g., meters), which are only meaningful in a projected CRS.

In this example, we first reproject the county polygon layer sf_nc_county into WGS 84 / UTM zone 17N (EPSG:32617), a projected CRS appropriate for central and eastern North Carolina. Then, we use terra::project() to reproject the raster layer spr_prec_nc to match the polygon layer’s CRS. The method = "bilinear" option is used for resampling continuous raster data (such as precipitation)3, providing smoother interpolation.

sf_nc_county_proj <- st_transform(sf_nc_county,
                                  crs = 32617)

spr_prec_nc_proj <- terra::project(x = spr_prec_nc, 
                                   y = crs(sf_nc_county_proj),
                                   method = "bilinear") 

With both layers now in a common projected CRS, we can proceed with zonal statistics, ensuring that area-weighted calculations are geometrically accurate.

In the code below, exact_extract() is used to calculate the mean precipitation within each county polygon (sf_nc_county_proj) from the reprojected raster spr_prec_nc_proj. The argument fun = "mean" specifies that we want to compute the simple mean of all raster cell values that overlap each polygon. The append_cols = TRUE option ensures that all original attributes from the sf_nc_county_proj object are retained in the output, which makes it easier to link the results back to the spatial features.

# NOTE: `progress = FALSE` turns off the progress bar for cleaner output
(df_prec_county <- exact_extract(x = spr_prec_nc_proj,
                                 y = sf_nc_county_proj,
                                 fun = "mean",
                                 append_cols = TRUE,
                                 progress = FALSE) %>% 
   as_tibble() %>% # convert to tibble
   rename(precipitation = mean)) # rename the output column)
## # A tibble: 100 × 2
##    county      precipitation
##    <chr>               <dbl>
##  1 ashe                1366.
##  2 alleghany           1369.
##  3 surry               1217.
##  4 currituck           1281.
##  5 northampton         1179.
##  6 hertford            1239.
##  7 camden              1299.
##  8 gates               1275.
##  9 warren              1140.
## 10 stokes              1218.
## # ℹ 90 more rows

Although the output from exact_extract() is a regular data frame (not an sf object), we can easily merge the computed statistics back into the original spatial data using left_join()4. This allows us to retain the spatial geometry of the county polygons while adding the precipitation values as new attributes.

In the example below, we assume that both the original sf object (sf_nc_county) and the extracted results (df_prec_county) share a common identifier column named county.

(sf_nc_county_prec <- sf_nc_county %>% 
   left_join(df_prec_county,
             by = "county"))
## Simple feature collection with 100 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.32377 ymin: 33.88212 xmax: -75.45662 ymax: 36.58973
## Geodetic CRS:  WGS 84
## First 10 features:
##         county precipitation                       geometry
## 1         ashe      1365.949 MULTIPOLYGON (((-81.47258 3...
## 2    alleghany      1368.901 MULTIPOLYGON (((-81.23971 3...
## 3        surry      1216.966 MULTIPOLYGON (((-80.45614 3...
## 4    currituck      1281.308 MULTIPOLYGON (((-76.00863 3...
## 5  northampton      1178.707 MULTIPOLYGON (((-77.21736 3...
## 6     hertford      1239.493 MULTIPOLYGON (((-76.74474 3...
## 7       camden      1299.447 MULTIPOLYGON (((-76.00863 3...
## 8        gates      1275.414 MULTIPOLYGON (((-76.56218 3...
## 9       warren      1139.996 MULTIPOLYGON (((-78.30849 3...
## 10      stokes      1217.916 MULTIPOLYGON (((-80.02545 3...

The following plot uses ggplot2 to visualize the mean precipitation values calculated for each county. The geom_sf() layer draws the county boundaries from the sf_nc_county_prec object, and the aes(color = precipitation) argument maps the precipitation values to a color gradient.

ggplot() +
  geom_sf(data = sf_nc_county_prec,
          aes(fill = precipitation)) +
  scale_fill_viridis_c() +
  theme_bw()

Alternative approach

The above example involved multiple steps to reproject both the raster and polygon layers into a common projected CRS, which is necessary for accurate area-based calculations.

However, the exact_extract() function provides a convenient alternative that avoids the need for explicit reprojection. By specifying fun = "weighted_mean" and weights = "area", we instruct the function to calculate the mean precipitation for each polygon by weighting each raster cell’s contribution according to the fraction of its area that overlaps the polygon.

This internal weighting allows exact_extract() to produce geometrically accurate results even when the layers remain in a geographic CRS (e.g., WGS 84), which is especially useful when working with global datasets or when reprojection is impractical.

(df_prec_county_alt <- exact_extract(x = spr_prec_nc,
                                     y = sf_nc_county,
                                     fun = "weighted_mean",
                                     weights = "area",
                                     append_cols = TRUE,
                                     progress = FALSE) %>% 
   as_tibble() %>% # convert to tibble
   rename(precipitation = weighted_mean)) # rename the output column
## # A tibble: 100 × 2
##    county      precipitation
##    <chr>               <dbl>
##  1 ashe                1366.
##  2 alleghany           1369.
##  3 surry               1217.
##  4 currituck           1281.
##  5 northampton         1179.
##  6 hertford            1239.
##  7 camden              1299.
##  8 gates               1275.
##  9 warren              1140.
## 10 stokes              1218.
## # ℹ 90 more rows

The numbers produced by the area-weighted approach without reprojection may differ slightly from those obtained using fully projected layers.

In most cases, this difference is negligible, making the approximation both practical and efficient; especially when working across large spatial extents that span multiple UTM zones or even continents, where choosing a single appropriate projection becomes difficult. In such cases, the ability of exact_extract() to apply area-based weights directly in geographic CRS is a major advantage, as it simplifies preprocessing and avoids potential distortions from inappropriate projections.

That said, if your analysis requires high spatial accuracy, for example, when modeling fine-scale ecological processes or when exact area measurements are critical, it is still recommended to reproject both raster and vector data to an appropriate projected CRS. Ultimately, the decision depends on the scale, purpose, and precision required by your specific application.

5.4.2 Buffer-based analysis

When linking environmental conditions to ecological data collected at survey sites, point-wise extraction is a straightforward and intuitive approach. However, environmental influences often operate at broader spatial scales. In such cases, buffer-based extraction may be more appropriate.

Like other zonal statistics, buffer analysis assumes a planar coordinate system, so it is important to first transform the data to a projected CRS. Once projected, you can generate buffers around the survey sites using the sf::st_buffer() function.

sf_site_proj <- sf_site %>%
  st_transform(crs = 32617)

sf_site_buff_proj <- sf_site_proj %>%
  st_buffer(dist = 10000) # unit is meter in a projected CRS

The object sf_site_buff_proj is currently in a projected CRS (WGS 84 / UTM zone 17N; EPSG:32617), which is appropriate for spatial operations like buffering.

However, for visualization, a geodetic CRS such as WGS84 (EPSG:4326) is typically preferred. Transform the object back to a geodetic CRS to ensure correct alignment with standard basemaps and mapping applications.

sf_site_buff <- sf_site_buff_proj %>%
  st_transform(crs = 4326)

Visualization allows you to see how buffers are created around the survey sites, helping to confirm that the spatial scale and geometry are appropriate for your analysis.

ggplot() +
  geom_sf(data = sf_nc_county,
          fill = "grey") +
  geom_sf(data = sf_site_buff,
          fill = "salmon") +
  geom_sf(data = sf_site) +
  theme_bw()

Taking zonal statistics is the same—after buffering in projected space, you can use the exactextractr::exact_extract() function to perform area-weighted extraction of raster values within each buffer zone.

df_prec_buff <- exact_extract(x = spr_prec_nc_proj,
                              y = sf_site_buff_proj,
                              fun = "mean",
                              append_cols = TRUE,
                              progress = FALSE) %>%
  as_tibble() %>%
  rename(precipitation = mean)

Like we did in the above examples, we can tie these extracted values back to the original point data by joining the summary statistics to the corresponding survey sites.

(sf_site_prec_buff <- sf_site %>% 
   left_join(df_prec_buff,
             by = "site_id"))
## Simple feature collection with 122 features and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -84.02452 ymin: 34.53497 xmax: -76.74611 ymax: 36.54083
## Geodetic CRS:  WGS 84
## # A tibble: 122 × 3
##    site_id                          geometry precipitation
##    <chr>                         <POINT [°]>         <dbl>
##  1 finsync_nrs_nc-10013 (-81.51025 36.11188)         1419.
##  2 finsync_nrs_nc-10014 (-80.35989 35.87616)         1139.
##  3 finsync_nrs_nc-10020 (-81.74472 35.64379)         1313.
##  4 finsync_nrs_nc-10023  (-82.77898 35.6822)         1279.
##  5 finsync_nrs_nc-10024 (-77.75384 35.38553)         1296.
##  6 finsync_nrs_nc-10027 (-83.69678 35.02467)         1799.
##  7 finsync_nrs_nc-10029 (-80.65668 35.16119)         1203.
##  8 finsync_nrs_nc-10034 (-82.04497 36.09917)         1648.
##  9 finsync_nrs_nc-10041 (-80.46558 35.04365)         1160.
## 10 finsync_nrs_nc-10049 (-77.96322 34.58249)         1444.
## # ℹ 112 more rows

Visualize:

ggplot() +
  geom_sf(data = sf_nc_county,
          fill = "grey") + 
  geom_sf(data = sf_site_prec_buff,
          aes(color = precipitation)) +
  scale_color_viridis_c() +
  theme_bw()

Fantastic!