Accesses and download environmental data from the Bluelink (CSIRO) server and append variables to detection data based on date of detection

extractBlue(
  df,
  X,
  Y,
  datetime,
  env_var,
  extract_depth = 0,
  var_name = paste(env_var, extract_depth, sep = "_"),
  folder_name = "Bluelink",
  env_buffer = 1,
  cache_layers = FALSE,
  full_timeperiod = FALSE,
  station_name = NULL,
  export_step = FALSE,
  export_path = "Processed_data",
  .parallel = FALSE,
  .ncores = NULL,
  verbose = TRUE
)

Arguments

df

detection data source in data frame with at the minimum a X, Y and date time field

X

name of column with X coordinate or longitude (EPSG 4326)

Y

name of column with Y coordinate or latitude (EPSG 4326)

datetime

name of column with date time stamp (Coordinated Universal Time; UTC)

env_var

variable needed from Bluelink. Options include ('BRAN_temp', 'BRAN_salt', BRAN_ssh', 'BRAN_mld', 'BRAN_cur', 'BRAN_wind').

extract_depth

Bluelink data is 3D, so data can be obtained either at the water surface or at depth. Please provide the depth of interest (between 0 and 4,509 m) as numeric and the function will automatically obtain the data at the nearest available layer. By default the data will be extracted at the water surface.

var_name

name for the column including the extracted environmental data. Can be usefull if the user wants to download the same environmental data at different depths. If not specified, it will be chosen based on the env_var and extract_depth arguments.

folder_name

name of folder within the working directory where the downloaded and processed netCDF files should be saved. Default (NULL) produces automatic folder names based on study extent and deletes processed files after processing.

env_buffer

distance (in decimal degrees) to expand the study area beyond the coordinates to extract environmental data. Default value is 1°.

cache_layers

should the downloaded and processed environmental data be cached within the working directory? If FALSE (default), the Bluelink data will be stored in a temporary folder and discarded after environmental extraction. Otherwise, it will be saved in the "cached" folder within folder_name.

full_timeperiod

should environmental variables be extracted for each day across full monitoring period? This option is time and memory consuming for long projects. If this option is selected, the returned dataset will be standardized for the days with/without detections across all stations (station_name column) where animals were detected. For more details please see the package vignettes.

station_name

if full_timeperiod = TRUE, please provide the column that identifies the name of the acoustic stations

export_step

should the processed dataset be exported to file? This is particularly usefull for large datasets, to avoid function failure due to issues with internet connexion. The rows with missing data will be exported as NAs, and only these will be completed if the function is rerun providing the exported dataset as input (df)

export_path

path and name of to export the dataset with appended environmnetal data

.parallel

should the function be run in parallel

.ncores

number of cores to use if set to parallel. If none provided, uses detectCores to determine number.

verbose

should function provide details of what operation is being conducted. Set to FALSE to keep it quiet

Value

a dataframe with the environmental variable appended as an extra column based on date of each detection

Details

The extractBlue function allows the user to download, process and append a range of 3D environmental variables (between the water surface to 4,509 m depth) to each detection within a telemetry data set. We advocate for users to first undertake a quality control step using the runQC function before further analysis, however the functionality to append environmental data will work on any dataset that has at the minimum spatial coordinates (i.e., latitude, longitude; in EPSG 4326) and a timestamp (in UTC) for each detection event. Quality controlled environmental variables housed in the Bluelink (BRAN) CSIRO server will be extracted for each specific coordinate at the specific timestamp where available. A summary table of the full range of environmental variables currently available can be accessed using the imos_variables function.

Examples

## Input example detection dataset that have run through the quality control
##   workflow (see 'runQC' function)

library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#>  dplyr     1.1.4      readr     2.1.5
#>  forcats   1.0.0      stringr   1.5.1
#>  ggplot2   3.5.1      tibble    3.2.1
#>  lubridate 1.9.3      tidyr     1.3.1
#>  purrr     1.0.2     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#>  dplyr::filter() masks stats::filter()
#>  dplyr::lag()    masks stats::lag()
#>  Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data("TownsvilleReefQC")

## simplify & subset data for speed
qc_data <- 
  TownsvilleReefQC %>% 
  unnest(cols = c(QC)) %>% 
  ungroup() %>% 
  filter(Detection_QC %in% c(1,2)) %>%
  filter(filename == unique(filename)[1]) %>%
  slice(1:20)

## Extract daily interpolated sea surface temperature
## cache_layers & fill_gaps args set to FALSE for speed
data_with_temp <- 
   extractBlue(df = qc_data,
               X = "receiver_deployment_longitude", 
               Y = "receiver_deployment_latitude", 
               datetime = "detection_datetime", 
               env_var = "BRAN_temp",
               extract_depth = 0,
               verbose = TRUE)
#> Downloading BRAN_temp data: 2013-08 | 2013-09
#> 
  |                                                        
  |                                                  |   0%
#> Error in bran_depth$st_ocean * -1: non-numeric argument to binary operator