R/extractBlue.R
extractBlue.Rd
Accesses and download environmental data from the Bluelink (CSIRO) server and append variables to detection data based on date of detection
extractBlue(
df,
X,
Y,
datetime,
env_var,
extract_depth = 0,
var_name = paste(env_var, extract_depth, sep = "_"),
folder_name = "Bluelink",
env_buffer = 1,
cache_layers = FALSE,
full_timeperiod = FALSE,
station_name = NULL,
export_step = FALSE,
export_path = "Processed_data",
.parallel = FALSE,
.ncores = NULL,
verbose = TRUE
)
detection data source in data frame with at the minimum a X, Y and date time field
name of column with X coordinate or longitude (EPSG 4326)
name of column with Y coordinate or latitude (EPSG 4326)
name of column with date time stamp (Coordinated Universal Time; UTC)
variable needed from Bluelink. Options include ('BRAN_temp', 'BRAN_salt', BRAN_ssh', 'BRAN_mld', 'BRAN_cur', 'BRAN_wind').
Bluelink data is 3D, so data can be obtained either at the water surface or at depth. Please provide the depth of interest (between 0 and 4,509 m) as numeric and the function will automatically obtain the data at the nearest available layer. By default the data will be extracted at the water surface.
name for the column including the extracted environmental data. Can be usefull if the user wants to download the same environmental data at different depths. If not specified, it will be chosen based on the env_var and extract_depth arguments.
name of folder within the working directory where the downloaded and processed netCDF files should be saved. Default (NULL) produces automatic folder names based on study extent and deletes processed files after processing.
distance (in decimal degrees) to expand the study area beyond the coordinates to extract environmental data. Default value is 1°.
should the downloaded and processed environmental data be cached within the working directory? If FALSE (default), the Bluelink data will be stored in a temporary folder and discarded after environmental extraction. Otherwise, it will be saved in the "cached" folder within folder_name.
should environmental variables be extracted for each day across full monitoring period? This option is time and memory consuming for long projects. If this option is selected, the returned dataset will be standardized for the days with/without detections across all stations (station_name column) where animals were detected. For more details please see the package vignettes.
if full_timeperiod = TRUE, please provide the column that identifies the name of the acoustic stations
should the processed dataset be exported to file? This is particularly usefull for large datasets, to avoid function failure due to issues with internet connexion. The rows with missing data will be exported as NAs, and only these will be completed if the function is rerun providing the exported dataset as input (df)
path and name of to export the dataset with appended environmnetal data
should the function be run in parallel
number of cores to use if set to parallel. If none provided,
uses detectCores
to determine number.
should function provide details of what operation is being conducted.
Set to FALSE
to keep it quiet
a dataframe with the environmental variable appended as an extra column based on date of each detection
The extractBlue
function allows the user to download, process and
append a range of 3D environmental variables (between the water surface to 4,509 m depth)
to each detection within a telemetry data set.
We advocate for users to first undertake a quality control step using
the runQC
function before further analysis, however the
functionality to append environmental data will work on any dataset that has
at the minimum spatial coordinates (i.e., latitude, longitude; in EPSG 4326)
and a timestamp (in UTC) for each detection event. Quality controlled
environmental variables housed in the Bluelink (BRAN) CSIRO server will be extracted
for each specific coordinate at the specific timestamp where available. A
summary table of the full range of environmental variables currently
available can be accessed using the imos_variables
function.
## Input example detection dataset that have run through the quality control
## workflow (see 'runQC' function)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data("TownsvilleReefQC")
## simplify & subset data for speed
qc_data <-
TownsvilleReefQC %>%
unnest(cols = c(QC)) %>%
ungroup() %>%
filter(Detection_QC %in% c(1,2)) %>%
filter(filename == unique(filename)[1]) %>%
slice(1:20)
## Extract daily interpolated sea surface temperature
## cache_layers & fill_gaps args set to FALSE for speed
data_with_temp <-
extractBlue(df = qc_data,
X = "receiver_deployment_longitude",
Y = "receiver_deployment_latitude",
datetime = "detection_datetime",
env_var = "BRAN_temp",
extract_depth = 0,
verbose = TRUE)
#> Downloading BRAN_temp data: 2013-08 | 2013-09
#>
|
| | 0%
|
|========================= | 50%
|
|==================================================| 100%
#> Extracting environmental data...
#>
|
| | 0%
|
|==================================================| 100%