Accesses and download environmental data from the IMOS THREDDS server and append variables to detection data based on date of detection

extractEnv(
  df,
  X = "longitude",
  Y = "latitude",
  datetime = "detection_timestamp",
  env_var,
  folder_name = NULL,
  verbose = TRUE,
  cache_layers = TRUE,
  crop_layers = TRUE,
  full_timeperiod = FALSE,
  fill_gaps = FALSE,
  buffer = NULL,
  nrt = FALSE,
  output_format = ".grd",
  .parallel = TRUE,
  .ncores = NULL
)

Arguments

df

detection data source in data frame with at the minimum a X, Y and date time field

X

name of column with X coordinate or longitude (EPSG 4326)

Y

name of column with Y coordinate or latitude (EPSG 4326)

datetime

name of column with date time stamp (Coordinated Universal Time; UTC)

env_var

variable needed options include ('rs_sst', 'rs_sst_interpolated', 'rs_salinity', 'rs_chl', 'rs_turbidity', 'rs_npp', 'bathy', 'dist_to_land', 'rs_current')

folder_name

name of folder within 'imos.cache' where downloaded rasters should be saved. default NULL produces automatic folder names based on study extent

verbose

should function provide details of what operation is being conducted. Set to FALSE to keep it quiet

cache_layers

should the extracted environmental data be cached within the working directory? if FALSE stored in temporary folder and discarded after environmental extraction

crop_layers

should the extracted environmental data be cropped to within the study site

full_timeperiod

should environmental variables extracted for each day across full monitoring period, time and memory consuming for long projects

fill_gaps

should the function use a spatial buffer to estimate environmental variables for detections where there is missing data. Default is FALSE to save computational time.

buffer

radius of buffer (in m) around each detection from which environmental variables should be extracted from. A median value of pixels that fall within the buffer will be used if fill_gaps = TRUE. If NULL a buffer will be chosen based on the resolution of environmental layer. A numeric value (in m) can be used here to customise buffer radius.

nrt

should Near Real-Time current data be used if Delayed-Mode current data is missing. Default is FALSE, in which case NA's are appended to current variables for years (currently, all years after 2020) when current data are missing. Note that Near Real-Time data are subject to less quality control than Delayed-Mode data.

output_format

File format for cached environmental layers. You can use gdal(drivers=TRUE) to see what drivers are available in your installation. The default format is '.grd'.

.parallel

should the function be run in parallel

.ncores

number of cores to use if set to parallel. If none provided, uses detectCores to determine number.

Value

a dataframe with the environmental variable appended as an extra column based on date of each detection

Details

The extractEnv function allows the user to access, download and append a range of environmental variables to each detection within a telemetry data set. We advocate for users to first undertake a quality control step using the runQC function before further analysis, however the functionality to append environmental data will work on any dataset that has at the minimum spatial coordinates (i.e., latitude, longitude; in EPSG 4326) and a timestamp (in UTC) for each detection event. Quality controlled environmental variables housed in the IMOS Thredds server will be extracted for each specific coordinate at the specific timestamp where available. A summary table of the full range of environmental variables currently available can be accessed using the imos_variables function.

Examples

## Input example detection dataset that have run through the quality control
##   workflow (see 'runQC' function)

library(tidyverse)
data("TownsvilleReefQC")

## simplify & subset data for example speed-up
qc_data <- 
  TownsvilleReefQC %>% 
  unnest(cols = c(QC)) %>% 
  ungroup() %>% 
  filter(Detection_QC %in% c(1,2)) %>%
  filter(filename == unique(filename)[1]) %>%
  slice(5:8)

## Extract daily interpolated sea surface temperature
## cache_layers & fill_gaps args set to FALSE for speed
data_with_sst <- 
  extractEnv(df = qc_data,
              X = "receiver_deployment_longitude", 
              Y = "receiver_deployment_latitude", 
              datetime = "detection_datetime", 
              env_var = "rs_sst_interpolated",
              cache_layers = FALSE,
              crop_layers = TRUE,
              full_timeperiod = FALSE,
              fill_gaps = TRUE,
              folder_name = "test",
              .parallel = FALSE)
#> Extracting environmental data only on days detections were present; between 2013-08-18 and 2013-09-05 (3 days)
#> This may take a little while...
#> Accessing and downloading IMOS environmental variable: rs_sst_interpolated
#> Checking if files exist on IMOS server...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
#> Extracting and appending environmental data
#> Filling gaps in environmental data by extracting median values from a 15km buffer around detections that fall on 'NA' values
#> Warning: [extract] transforming vector data to the CRS of the raster
#> Warning: [extract] transforming vector data to the CRS of the raster