% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cross-val.R
\name{sdmTMB_cv}
\alias{sdmTMB_cv}
\title{Cross validation with sdmTMB models}
\usage{
sdmTMB_cv(
  formula,
  data,
  mesh_args,
  mesh = NULL,
  time = NULL,
  k_folds = 8,
  fold_ids = NULL,
  lfo = FALSE,
  lfo_forecast = 1,
  lfo_validations = 5,
  parallel = TRUE,
  use_initial_fit = FALSE,
  future_globals = NULL,
  spde = deprecated(),
  ...
)
}
\arguments{
\item{formula}{Model formula.}

\item{data}{A data frame.}

\item{mesh_args}{Arguments for \code{\link[=make_mesh]{make_mesh()}}. If supplied, the mesh will be
reconstructed for each fold.}

\item{mesh}{Output from \code{\link[=make_mesh]{make_mesh()}}. If supplied, the mesh will be constant
across folds.}

\item{time}{The name of the time column. Leave as \code{NULL} if this is only
spatial data.}

\item{k_folds}{Number of folds.}

\item{fold_ids}{Optional vector containing user fold IDs. Can also be a
single string, e.g. \code{"fold_id"} representing the name of the variable in
\code{data}. Ignored if \code{lfo} is TRUE}

\item{lfo}{Whether to implement leave-future-out (LFO) cross validation where
data are used to predict future folds. \code{time} argument in \code{\link[=sdmTMB]{sdmTMB()}} must
be specified. See Details section below.}

\item{lfo_forecast}{If \code{lfo = TRUE}, number of time steps to forecast. Time
steps 1, ..., T are used to predict T + \code{lfo_forecast} and the last
forecasted time step is used for validation. See Details section below.}

\item{lfo_validations}{If \code{lfo = TRUE}, number of times to step through the
LFOCV process. Defaults to 5. See Details section below.}

\item{parallel}{If \code{TRUE} and a \code{\link[future:plan]{future::plan()}} is supplied, will be run in
parallel.}

\item{use_initial_fit}{Fit the first fold and use those parameter values
as starting values for subsequent folds? Can be faster with many folds.}

\item{future_globals}{A character vector of global variables used within
arguments if an error is returned that \pkg{future.apply} can't find an
object. This vector is appended to \code{TRUE} and passed to the argument
\code{future.globals} in \code{\link[future.apply:future_lapply]{future.apply::future_lapply()}}. Useful if global
objects are used to specify arguments like priors, families, etc.}

\item{spde}{\strong{Depreciated.} Use \code{mesh} instead.}

\item{...}{All other arguments required to run \code{\link[=sdmTMB]{sdmTMB()}} model with the
exception of \code{weights}, which are used to define the folds.}
}
\value{
A list:
\itemize{
\item \code{data}: Original data plus columns for fold ID, CV predicted value,
and CV log likelihood.
\item \code{models}: A list of models; one per fold.
\item \code{fold_loglik}: Sum of left-out log likelihoods per fold.
\item \code{sum_loglik}: Sum of \code{fold_loglik} across all left-out data.
\item \code{pdHess}: Logical vector: Hessian was invertible each fold?
\item \code{converged}: Logical: all \code{pdHess} \code{TRUE}?
\item \code{max_gradients}: Max gradient per fold.
}

Prior to \pkg{sdmTMB} version '0.3.0.9002', \code{elpd} was incorrectly returned as
the log average likelihood, which is another metric you could compare models
with, but not ELPD. For maximum likelihood, \href{https://github.com/pbs-assess/sdmTMB/issues/235}{ELPD is equivalent in spirit to the sum of the log likelihoods}.
}
\description{
Facilitates cross validation with sdmTMB models. Returns the log likelihood
of left-out data, which is similar in spirit to the ELPD (expected log
pointwise predictive density). The function has an option for
leave-future-out cross validation. By default, the function creates folds
randomly but folds can be manually assigned via the \code{fold_ids} argument.
}
\details{
\strong{Parallel processing}

Parallel processing can be used by setting a \code{future::plan()}.

For example:

\if{html}{\out{<div class="sourceCode">}}\preformatted{library(future)
plan(multisession)
# now use sdmTMB_cv() ...
}\if{html}{\out{</div>}}

\strong{Leave-future-out cross validation (LFOCV)}

An example of LFOCV with 9 time steps, \code{lfo_forecast = 1}, and
\code{lfo_validations = 2}:
\itemize{
\item Fit data to time steps 1 to 7, predict and validate step 8.
\item Fit data to time steps 1 to 8, predict and validate step 9.
}

An example of LFOCV with 9 time steps, \code{lfo_forecast = 2}, and
\code{lfo_validations = 3}:
\itemize{
\item Fit data to time steps 1 to 5, predict and validate step 7.
\item Fit data to time steps 1 to 6, predict and validate step 8.
\item Fit data to time steps 1 to 7, predict and validate step 9.
}

See example below.
}
\examples{
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 25)

# Set parallel processing first if desired with the future package.
# See the Details section above.

m_cv <- sdmTMB_cv(
  density ~ 0 + depth_scaled + depth_scaled2,
  data = pcod, mesh = mesh,
  family = tweedie(link = "log"), k_folds = 2
)

m_cv$fold_loglik
m_cv$sum_loglik

head(m_cv$data)
m_cv$models[[1]]
m_cv$max_gradients

\donttest{
# Create mesh each fold:
m_cv2 <- sdmTMB_cv(
  density ~ 0 + depth_scaled + depth_scaled2,
  data = pcod, mesh_args = list(xy_cols = c("X", "Y"), cutoff = 20),
  family = tweedie(link = "log"), k_folds = 2
)

# Use fold_ids:
m_cv3 <- sdmTMB_cv(
  density ~ 0 + depth_scaled + depth_scaled2,
  data = pcod, mesh = mesh,
  family = tweedie(link = "log"),
  fold_ids = rep(seq(1, 3), nrow(pcod))[seq(1, nrow(pcod))]
)

# LFOCV:
m_lfocv <- sdmTMB_cv(
  present ~ s(year, k = 4),
  data = pcod,
  mesh = mesh,
  lfo = TRUE,
  lfo_forecast = 2,
  lfo_validations = 3,
  family = binomial(),
  spatiotemporal = "off",
  time = "year" # must be specified
)

# See how the LFOCV folds were assigned:
example_data <- m_lfocv$models[[1]]$data
table(example_data$cv_fold, example_data$year)
}
}
