Fit an IPM or run-reconstruction regression model
salmonIPM.Rd
Fits integrated population models (IPMs) or run-reconstruction regressions for anadromous Pacific salmonid population dynamics using a hierarchical Bayesian framework implemented in Stan. Various models are available, representing alternative life-history structures and independent or hierarchically pooled populations. Users can specify stage-specific covariate effects and hyper-priors using formula syntax.
Usage
salmonIPM(
stan_model = paste(model, life_cycle, ifelse(pool_pops, "pp", "np"), sep = "_"),
model = c("IPM", "RR"),
life_cycle = c("SS", "SSiter", "SMS", "SMaS", "LCRchum"),
ages = NULL,
pool_pops = nlevels(factor(fish_data$pop)) > 1,
SR_fun = c("BH", "B-H", "bh", "b-h", "Ricker", "ricker", "exp"),
RRS = "none",
par_models = NULL,
center = TRUE,
scale = TRUE,
age_F = NULL,
age_B = NULL,
age_S_obs = NULL,
age_S_eff = NULL,
conditionGRonMS = FALSE,
priors = NULL,
fish_data,
fecundity_data = NULL,
pars = "all",
include = TRUE,
log_lik = FALSE,
init = NULL,
chains = 4,
iter = 2000,
warmup = floor(iter/2),
thin = 1,
cores = parallel::detectCores(logical = FALSE),
control = NULL,
save_data = FALSE,
...
)
Arguments
- stan_model
Character string specifying the salmonIPM model to be fit. A more concise alternative to specifying
model
,life_cycle
, andpool_pops
and will override those arguments.- model
Either
"IPM"
or"RR"
, indicating whether to fit an IPM or run-reconstruction regression model.- life_cycle
Character string indicating which life-cycle model to fit. One of the following options (must be
"SS"
ifmodel == "RR"
):"SS"
Spawner-to-spawner (the default)"SSiter"
Spawner-to-spawner with iteroparity (an alias for"SS"
withstan_data("IPM_SS_[x]p", fish_data = fish_data)$iter
set to 1)"SMS"
Spawner-smolt-spawner with a fixed smolt age"SMaS"
Spawner-smolt-spawner with multiple smolt age classes (currently only available forpool_pops == FALSE
)"LCRchum"
Customized spawner-smolt-spawner model for Lower Columbia River chum (pool_pops == TRUE
)
- ages
For multi-stage models, a named list giving the ages in years of all fixed-age subadult life stages. This is not needed for
IPM_SMaS_np
because in that case smolt age structure is provided infish_data
.- pool_pops
Logical defaulting to
TRUE
if multiple populations are present infish_data
andFALSE
otherwise, indicating whether to model multiple populations hierarchically rather than as independent "fixed effects". It is possible to fit a model to multiple populations simultaneously even though they share no parameters; indeed this is more efficient than fitting them one at a time because calculations are vectorized and warmup is shared.- SR_fun
One of
"exp"
(density-independent discrete exponential),"BH"
(Beverton-Holt, the default), or"Ricker"
, indicating which spawner-recruit function to fit. Synonyms"DI"
,"B-H"
,"bh"
,"b-h"
and"ricker"
are accepted.- RRS
A character string or vector of strings naming parameters of the function specified by
SR_fun
that differ between wild- and hatchery-origin spawners, such that the relative reproductive success of hatchery spawners is not equal to 1. Ifpool_pops == TRUE
, these should be the names of the population-specific parameters, not their hyper-means. For example, iflife_cycle %in% c("SS","SSiter")
, the options are"none"
(the default),"alpha"
,"Rmax"
, orc("alpha","Rmax")
. CurrentlyRRS
is only implemented forpool_pops == FALSE
.- par_models
Optional list of two-sided formulas of the form
theta ~ x1 + ... + xK
, wheretheta
is a (hyper)parameter or state in the model specified bystan_model
that accepts covariates (see Details for available model-parameter combinations) andx1 + ... + xK
are terms involving variables infish_data
. Standard formula syntax such as:
and*
may be used; seestats::formula()
.- center
Logical indicating whether the terms in model matrices constructed from
fish_data
using the formulas inpar_models
should be centered. It is usually recommended to use the default (TRUE
) so the baseline parameter estimate applies when predictors are at their sample means, but in some cases such as factor predictorscenter = FALSE
may be appropriate. If combining categorical and numeric predictors, the latter can be centered and scaled prior to modeling.- scale
Logical indicating whether the model matrices constructed from
fish_data
using the formulas inpar_models
should be scaled to have column SDs of 1. Unit-scaling predictors is less critical than centering, but is advisable if variables have scales far from 1.- age_F
Logical or 0/1 vector of length
N_age
indicating whether each adult age is fully (non)selected by the fishery. The default is all selected. Iflife_cycle == "SSiter"
,N_age
refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_B
Logical or 0/1 vector of length
N_age
indicating whether each adult age is fully (non)selected in broodstock collection. The default is all selected. Iflife_cycle == "SSiter"
,N_age
refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_S_obs
If
stan_model == "IPM_SS_pp"
, a logical or 0/1 integer vector indicating, for each adult age, whether the observed total spawner data includes that age. The default is to treatS_obs
as including spawners of all ages. This option may be useful if certain age classes are not counted. Iflife_cycle == "SSiter"
,N_age
refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_S_eff
If
stan_model == "IPM_SS_pp"
, a logical or 0/1 vector indicating, for each adult age, whether spawners of that age contribute to reproduction. This can be used, e.g., to exclude jacks from the effective breeding population. The default is to include spawners of all ages. Iflife_cycle == "SSiter"
,N_age
refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- conditionGRonMS
If
life_cycle == "SMaS"
, logical indicating whether the Gilbert-Rich age frequenciesn_GRage_obs
infish_data
are conditioned on ocean age. IfFALSE
(the default) the counts are assumed to be sampled randomly from the population. IfTRUE
, it is assumed that the number of spawners of each ocean age (e.g., jacks vs 1-ocean) is arbitrary, but smolt (FW) age is randomly sampled within each ocean age; i.e., in asmolt age x ocean age
contingency table, the cell frequencies are conditioned on the column totals.- priors
Optional list of two-sided formulas of the form
theta ~ distribution(params)
, wheretheta
is a hyperparameter that can take a user-specified prior anddistribution()
is its canonical prior family. Seepriors
for details on the available parameters in each model and their corresponding families. Any hyperparameters not given explicit priors will use the default values of the priorparams
.- fish_data
Data frame where each row corresponds to a unique population
x
year, that includes the followingcolnames
in no particular order except where noted (namely age- and origin-composition data):pop
Required factor, numeric, or character population name or ID. Will be coerced to a factor, but it is recommended that this be a factor with concise, informative levels, e.g."Johnson Cr"
. This is especially true if there are multiple populations, in which caselevels(factor(pop))
can be used in interpreting and plotting the posterior draws.year
Required numeric or integer giving the calendar year corresponding to each observation. Note thatfish_data
is not indexed by brood year. For a brood table run reconstruction, seerun_recon()
.A
Required spawning habitat size (either stream length or area). Will often be time-invariant within a population, but need not be. Habitat size is used internally to convert population size scaling parameters (e.g.,Rmax
) from density to abundance, so ifA == 1
no rescaling is done and these parameters are in units of fish. This is fine ifpool_pops == FALSE
, but in hierarchical multi-population models it is advisable to provide habitat size so that the assumption of hierarchical exchangeability is more valid.M_obs
Iflife_cycle %in% c("SMS","SMaS","LCRchum")
, the observed number of wild-origin smolts (integer
ornumeric
). Missing / unknown observations are coded asNA
.tau_M_obs
Iflife_cycle == "LCRchum"
, known lognormal observation error SDs for smolt abundance. Missing values (NA
) will be imputed.downstream_trap
Iflife_cycle == "LCRchum"
, row indices corresponding to a downstream smolt trap in a different population whose catch additionally includes the smolts produced in one or more upstream populations, assuming no extra mortality en route. Each upstream population can have at most one downstream trap (in addition to its own, if any) but a trap can have multiple upstream populations. Ifdownstream_trap[i] == j
, thenM_downstream[j] = M[j] + M[i]
. Ifis.na(downstream_trap[i])
thenM[i]
is not double-counted.n_Mage[min_Mage]_obs...n_Mage[max_Mage]_obs
Iflife_cycle == "SMaS"
, multiple columns of observed smolt age sample frequencies (counts), where[min_Mage]
and[max_Mage]
are the numeral age in years of the youngest and oldest smolts, respectively. Note that age is measured in calendar years from the brood year (i.e., the Gilbert-Rich system).S_obs
Required observed total escapement of all wild and hatchery-origin spawners (integer
ornumeric
). Missing / unknown observations are coded asNA
.tau_S_obs
Iflife_cycle == "LCRchum"
, known lognormal observation error SDs for spawner abundance. Missing values (NA
) will be imputed.n_age[min_age]_obs ... n_age[max_age]_obs
Integer columns of observed spawner age sample frequencies (counts), where[min_age]
and[max_age]
are the numeral age in years (total, not ocean age) of the youngest and oldest spawners. Required unlesslife_cycle %in% c("SSiter","SMaS")
, in which case it is ignored. Note thatn_age_obs
and all other compositional data types must not containNA
. If the sample included no individuals of a given category or if no samples were collected, the observed frequency is 0.n_age[min_age]M_obs ... n_age[max_age]M_obs n_age[min_age + 1]K_obs ... n_age[max_age + 1]K_obs
, Iflife_cycle == "SSiter"
, integer columns of observed first-time (maiden) and repeat (former kelt) spawner age frequencies where[min_age]
and[max_age]
are the total age in years of the youngest and oldest maiden spawners, respectively. Contiguous maiden age columns denoted byM
are followed by an equal number of contiguous repeat age columns denoted byK
, where each repeat age is 1 year greater than the corresponding maiden age. The maximum repeat age class is a plus-group, i.e. it includes all repeat spawners agemax_age + 1
or older.n_MSage[min_MSage]_obs ... n_MSage[max_MSage]_obs
Iflife_cycle == "SMaS"
, integer columns of observed spawner ocean age sample frequencies, where[min_MSage]
and[max_MSage]
are the youngest and oldest ocean age in years, respectively. Nonzero ocean age frequencies are only required ifconditionGRonMS == TRUE
(the columns must be present in any case). IfconditionGRonMS == FALSE
, thenn_MSage_obs
represents independent samples, not simply the (implicit) ocean-age marginal totals ofn_GRage_obs
.n_GRage[min_age]_[min_Mage]_obs ... n_GRage[max_age]_[max_Mage]_obs
Iflife_cycle == "SMaS"
, integer columns of observed Gilbert-Rich age sample frequencies, varying fastest by smolt age (min_Mage:max_Mage
) and then by total age (min_age:max_age
). For example, a life history with subyearling or yearling smolts and ocean ages 2:3 would have column namesc("n_GRage_3_1_obs", "n_GRage_4_1_obs", "n_GRage_4_2_obs", "n_GRage_5_2_obs")
. All combinations of smolt age and (implicitly) ocean age must be represented, even if some were never observed.n_W_obs
Required integer observed sample frequencies of natural-origin spawners.n_H_obs
Required integer observed sample frequencies of hatchery-origin spawners.fit_p_HOS
Logical or 0/1 indicating for each rowi
infish_data
whether the model should estimatep_HOS[i] > 0
. Required ifmodel == "IPM"
unlesslife_cycle == "LCRchum"
.stan_data()
will give a warning if any rowi
meets either of two conditions:as.logical(fit_p_HOS[i]) == FALSE
butn_W_obs[i] + n_H_obs[i] > 0
, oras.logical(fit_p_HOS[i]) == TRUE
butn_W_obs[i] + n_H_obs[i] == 0
. The first means HOR were observed, so not accounting for them risks biasing the estimated parameters and states (aka "masking"). The second means the model is being asked to estimatep_HOS[i]
with no case-specific hatchery / wild origin-frequency data. Becausep_HOS[i]
is an a priori independent parameter (a "fixed effect"), this is a difficult task. There may be some shared information via the process model to indirectly inform it, but this is likely to lead to poor estimates and sampling problems.n_O0_obs n_O[which_O_pop[1]]_obs ... n_O[which_O_pop[N_O_pop]]_obs
Iflife_cycle = "LCRchum"
, multiple columns of observed origin sample frequencies. The first column, named "O" for origin and "0" for null / naught, refers to unknown natural origin, i.e. unmarked spawners presumed to be NOR. The nextN_O_pop
columns are numbered by the levels offactor(fish_data$pop)
corresponding to the set of known-origin populations. Typically these are hatcheries, but NOR may be identified by PIT tags, parentage-based tagging, or other means. TheLCRchum
model uses origin-composition observations to infer the dispersal rates of hatchery (or other known-origin) fish, son_W_obs
(the same asn_O0_obs
assuming all known origins are hatcheries) andn_H_obs
(equal tosum(n_O_obs[-1])
in that case) are not needed, although they can be included infish_data
for informational purposes or for post-processing draws. Likewisefit_p_HOS
is not needed and will be ignored.n_M_obs
Iflife_cycle == "LCRchum"
, integer observed frequencies of male spawners.n_F_obs
Iflife_cycle == "LCRchum"
, integer observed frequencies of female spawners.p_G_obs
Iflife_cycle == "LCRchum"
, observed proportion (assumed known without error) of female spawners that are "green", i.e. fully fecund.F_rate
Total harvest rate (proportion) of natural-origin fish. Required for all models, even if all values are 0.B_take_obs
Number of adults taken for hatchery broodstock. Required for all models, even if all values are 0.S_add_obs
Ifstan_model == "IPM_LCRchum_pp"
, number of adults translocated into population....
Additional variables to be used as covariates. These can vary spatially and/or temporally.
- fecundity_data
If
life_cycle == "LCRchum"
, data frame with the following columns, representing observations of fecundity with each row corresponding to a female:age_E
Female age in years.E_obs
Observed fecundity.
- pars
A character vector specifying (hyper)parameters, states, and/or quantities of interest ("parameters") to be saved. The default is to save all parameters. Parameters can be explicitly named or one or more shortcuts can be used to specify hierarchical levels of parameters; see
stan_pars()
for details. If parameters are explicitly named, the user should make sure they exist instan_model
, e.g. by callingstan_pars(stan_model)
.- include
Logical scalar defaulting to
TRUE
indicating whether to include or exclude the parameters given bypars
. IfFALSE
, only entire multidimensional parameters can be excluded, rather than particular elements of them.- log_lik
Logical scalar indicating whether the pointwise log-likelihood should be saved, e.g. for later use with
loo::loo()
.- init
A list of named lists of initial values to be passed to
rstan::sampling()
. The default (and recommended)NULL
randomly generates initial values for each chain given the data and model usingstan_init()
.- chains
Positive integer specifying the number of HMC chains; see
rstan::sampling()
.- iter
Positive integer specifying the number of iterations for each chain (including warmup); see
rstan::sampling()
.- warmup
Positive integer specifying the number of warmup iterations per chain. If step-size adaptation is enabled (the default), this also controls the number of iterations for which adaptation is run; hence these warmup samples should not be used for inference. The number of warmup iterations should not be larger than
iter
. Seerstan::sampling()
.- thin
Positive integer specifying the period for saving samples. The default is 1, which is usually the recommended value. See
rstan::sampling()
.- cores
Number of cores to use when running chains in parallel. Defaults to the number of physical cores available. See
rstan::sampling()
.- control
A named list of options to control sampler behavior. See
rstan::stan()
for details and available options. In contrast to rstan, the default value ofadapt_delta
in salmonIPM is increased to 0.95 as we have found this necessary to minimize divergences in most cases.- save_data
Logical scalar defaulting to
FALSE
indicating whether to save the data passed to Stan bystan_data()
in the salmonIPMfit object. Can be useful for posterior predictive checking and reproducibility, at the cost of increased object size.- ...
Additional arguments to pass to
rstan::sampling()
.
Value
An object of class salmonIPMfit
representing the fitted model. See
salmonIPMfit
for details.
Details
The following parameters and states can be modeled as functions of
covariates using the argument par_models
. The response distribution
families and link functions are automatically implemented so there is no
need to log()
- or qlogis()
-transform the left-hand side of the formula,
although such syntax will also work (because the LHS is not found in
fish_data
and is just syntactic sugar to determine the parameter name).
The design matrices passed to the Stan model cannot include an intercept,
but it is not necessary to manually remove it in the RHS; if present by
default, par_model_matrix()
will automatically remove it.
As with any regression model, the user must ensure the effects specified
are estimable given the design matrix. For example, the effect of a
spatially varying but time-invariant predictor would not be identifiable in
a np
model because populations are modeled independently.
Response (family) | ||||||||
Model | alpha | (lognormal) | psi | (logistic normal) | Rmax | (lognormal) | Mmax | (lognormal) |
R | (lognormal) | M | (lognormal) | s_MS | (logistic normal) | s_SS | (logistic normal) | IPM_SS_np |
☑ | ☐ | ☑ | ☐ | ☑ | ☐ | ☐ | ☐ | IPM_SSiter_np |
☑ | ☐ | ☑ | ☐ | ☑ | ☐ | ☐ | ☑ | IPM_SS_pp |
☑ | ☐ | ☑ | ☐ | ☑ | ☐ | ☐ | ☐ | IPM_SSiter_pp |
☑ | ☐ | ☑ | ☐ | ☑ | ☐ | ☐ | ☑ | IPM_SMS_np |
☑ | ☐ | ☐ | ☑ | ☐ | ☑ | ☑ | ☐ | IPM_SMS_pp |
☑ | ☐ | ☐ | ☑ | ☐ | ☑ | ☑ | ☐ | IPM_SMaS_np |
☑ | ☐ | ☐ | ☑ | ☐ | ☑ | ☑ | ☐ | IPM_LCRchum_pp |
☐ | ☑ | ☐ | ☑ | ☐ | ☑ | ☑ | ☐ |