Prepare input data for IPMs or run-reconstruction models
stan_data.Rd
This function is mostly used internally, but may occasionally be useful for diagnosing
problems (e.g., checking the numeric coding of populations and years or the
replacement values for NA
s) or for plotting.
Usage
stan_data(
stan_model = c("IPM_SS_np", "IPM_SSiter_np", "IPM_SS_pp", "IPM_SSiter_pp",
"IPM_SMS_np", "IPM_SMS_pp", "IPM_SMaS_np", "IPM_LCRchum_pp", "RR_SS_np", "RR_SS_pp"),
SR_fun = "BH",
RRS = "none",
ages = NULL,
par_models = NULL,
center = TRUE,
scale = TRUE,
priors = NULL,
fish_data,
age_F = NULL,
age_B = NULL,
age_S_obs = NULL,
age_S_eff = NULL,
conditionGRonMS = FALSE,
fecundity_data = NULL
)
Arguments
- stan_model
Character string specifying the salmonIPM model to be fit.
- SR_fun
One of
"exp"
(density-independent discrete exponential),"BH"
(Beverton-Holt, the default), or"Ricker"
, indicating which spawner-recruit function to fit. Synonyms"DI"
,"B-H"
,"bh"
,"b-h"
and"ricker"
are accepted.- RRS
A character string or vector of strings naming parameters of the function specified by
SR_fun
that differ between wild- and hatchery-origin spawners, such that the relative reproductive success of hatchery spawners is not equal to 1. Ifpool_pops == TRUE
, these should be the names of the population-specific parameters, not their hyper-means. For example, iflife_cycle %in% c("SS","SSiter")
, the options are"none"
(the default),"alpha"
,"Rmax"
, orc("alpha","Rmax")
. CurrentlyRRS
is only implemented forpool_pops == FALSE
.- ages
For multi-stage models, a named list giving the ages in years of all fixed-age subadult life stages. This is not needed for
IPM_SMaS_np
because in that case smolt age structure is provided infish_data
.- par_models
Optional list of two-sided formulas of the form
theta ~ x1 + ... + xK
, wheretheta
is a (hyper)parameter or state in the model specified bystan_model
that accepts covariates (see Details for available model-parameter combinations) andx1 + ... + xK
are terms involving variables infish_data
. Standard formula syntax such as:
and*
may be used; seestats::formula()
.- center
Logical indicating whether the terms in model matrices constructed from
fish_data
using the formulas inpar_models
should be centered. It is usually recommended to use the default (TRUE
) so the baseline parameter estimate applies when predictors are at their sample means, but in some cases such as factor predictorscenter = FALSE
may be appropriate. If combining categorical and numeric predictors, the latter can be centered and scaled prior to modeling.- scale
Logical indicating whether the model matrices constructed from
fish_data
using the formulas inpar_models
should be scaled to have column SDs of 1. Unit-scaling predictors is less critical than centering, but is advisable if variables have scales far from 1.- priors
Optional list of two-sided formulas of the form
theta ~ distribution(params)
, wheretheta
is a hyperparameter that can take a user-specified prior anddistribution()
is its canonical prior family. Seepriors
for details on the available parameters in each model and their corresponding families. Any hyperparameters not given explicit priors will use the default values of the priorparams
.- fish_data
Data frame where each row corresponds to a unique population
x
year, that includes the followingcolnames
in no particular order except where noted (namely age- and origin-composition data):pop
Required factor, numeric, or character population name or ID. Will be coerced to a factor, but it is recommended that this be a factor with concise, informative levels, e.g."Johnson Cr"
. This is especially true if there are multiple populations, in which caselevels(factor(pop))
can be used in interpreting and plotting the posterior draws.year
Required numeric or integer giving the calendar year corresponding to each observation. Note thatfish_data
is not indexed by brood year. For a brood table run reconstruction, seerun_recon()
.A
Required spawning habitat size (either stream length or area). Will often be time-invariant within a population, but need not be. Habitat size is used internally to convert population size scaling parameters (e.g.,Rmax
) from density to abundance, so ifA == 1
no rescaling is done and these parameters are in units of fish. This is fine ifpool_pops == FALSE
, but in hierarchical multi-population models it is advisable to provide habitat size so that the assumption of hierarchical exchangeability is more valid.M_obs
Iflife_cycle %in% c("SMS","SMaS","LCRchum")
, the observed number of wild-origin smolts (integer
ornumeric
). Missing / unknown observations are coded asNA
.tau_M_obs
Iflife_cycle == "LCRchum"
, known lognormal observation error SDs for smolt abundance. Missing values (NA
) will be imputed.downstream_trap
Iflife_cycle == "LCRchum"
, row indices corresponding to a downstream smolt trap in a different population whose catch additionally includes the smolts produced in one or more upstream populations, assuming no extra mortality en route. Each upstream population can have at most one downstream trap (in addition to its own, if any) but a trap can have multiple upstream populations. Ifdownstream_trap[i] == j
, thenM_downstream[j] = M[j] + M[i]
. Ifis.na(downstream_trap[i])
thenM[i]
is not double-counted.n_Mage[min_Mage]_obs...n_Mage[max_Mage]_obs
Iflife_cycle == "SMaS"
, multiple columns of observed smolt age sample frequencies (counts), where[min_Mage]
and[max_Mage]
are the numeral age in years of the youngest and oldest smolts, respectively. Note that age is measured in calendar years from the brood year (i.e., the Gilbert-Rich system).S_obs
Required observed total escapement of all wild and hatchery-origin spawners (integer
ornumeric
). Missing / unknown observations are coded asNA
.tau_S_obs
Iflife_cycle == "LCRchum"
, known lognormal observation error SDs for spawner abundance. Missing values (NA
) will be imputed.n_age[min_age]_obs ... n_age[max_age]_obs
Integer columns of observed spawner age sample frequencies (counts), where[min_age]
and[max_age]
are the numeral age in years (total, not ocean age) of the youngest and oldest spawners. Required unlesslife_cycle %in% c("SSiter","SMaS")
, in which case it is ignored. Note thatn_age_obs
and all other compositional data types must not containNA
. If the sample included no individuals of a given category or if no samples were collected, the observed frequency is 0.n_age[min_age]M_obs ... n_age[max_age]M_obs n_age[min_age + 1]K_obs ... n_age[max_age + 1]K_obs
, Iflife_cycle == "SSiter"
, integer columns of observed first-time (maiden) and repeat (former kelt) spawner age frequencies where[min_age]
and[max_age]
are the total age in years of the youngest and oldest maiden spawners, respectively. Contiguous maiden age columns denoted byM
are followed by an equal number of contiguous repeat age columns denoted byK
, where each repeat age is 1 year greater than the corresponding maiden age. The maximum repeat age class is a plus-group, i.e. it includes all repeat spawners agemax_age + 1
or older.n_MSage[min_MSage]_obs ... n_MSage[max_MSage]_obs
Iflife_cycle == "SMaS"
, integer columns of observed spawner ocean age sample frequencies, where[min_MSage]
and[max_MSage]
are the youngest and oldest ocean age in years, respectively. Nonzero ocean age frequencies are only required ifconditionGRonMS == TRUE
(the columns must be present in any case). IfconditionGRonMS == FALSE
, thenn_MSage_obs
represents independent samples, not simply the (implicit) ocean-age marginal totals ofn_GRage_obs
.n_GRage[min_age]_[min_Mage]_obs ... n_GRage[max_age]_[max_Mage]_obs
Iflife_cycle == "SMaS"
, integer columns of observed Gilbert-Rich age sample frequencies, varying fastest by smolt age (min_Mage:max_Mage
) and then by total age (min_age:max_age
). For example, a life history with subyearling or yearling smolts and ocean ages 2:3 would have column namesc("n_GRage_3_1_obs", "n_GRage_4_1_obs", "n_GRage_4_2_obs", "n_GRage_5_2_obs")
. All combinations of smolt age and (implicitly) ocean age must be represented, even if some were never observed.n_W_obs
Required integer observed sample frequencies of natural-origin spawners.n_H_obs
Required integer observed sample frequencies of hatchery-origin spawners.fit_p_HOS
Logical or 0/1 indicating for each rowi
infish_data
whether the model should estimatep_HOS[i] > 0
. Required ifmodel == "IPM"
unlesslife_cycle == "LCRchum"
.stan_data()
will give a warning if any rowi
meets either of two conditions:as.logical(fit_p_HOS[i]) == FALSE
butn_W_obs[i] + n_H_obs[i] > 0
, oras.logical(fit_p_HOS[i]) == TRUE
butn_W_obs[i] + n_H_obs[i] == 0
. The first means HOR were observed, so not accounting for them risks biasing the estimated parameters and states (aka "masking"). The second means the model is being asked to estimatep_HOS[i]
with no case-specific hatchery / wild origin-frequency data. Becausep_HOS[i]
is an a priori independent parameter (a "fixed effect"), this is a difficult task. There may be some shared information via the process model to indirectly inform it, but this is likely to lead to poor estimates and sampling problems.n_O0_obs n_O[which_O_pop[1]]_obs ... n_O[which_O_pop[N_O_pop]]_obs
Iflife_cycle = "LCRchum"
, multiple columns of observed origin sample frequencies. The first column, named "O" for origin and "0" for null / naught, refers to unknown natural origin, i.e. unmarked spawners presumed to be NOR. The nextN_O_pop
columns are numbered by the levels offactor(fish_data$pop)
corresponding to the set of known-origin populations. Typically these are hatcheries, but NOR may be identified by PIT tags, parentage-based tagging, or other means. TheLCRchum
model uses origin-composition observations to infer the dispersal rates of hatchery (or other known-origin) fish, son_W_obs
(the same asn_O0_obs
assuming all known origins are hatcheries) andn_H_obs
(equal tosum(n_O_obs[-1])
in that case) are not needed, although they can be included infish_data
for informational purposes or for post-processing draws. Likewisefit_p_HOS
is not needed and will be ignored.n_M_obs
Iflife_cycle == "LCRchum"
, integer observed frequencies of male spawners.n_F_obs
Iflife_cycle == "LCRchum"
, integer observed frequencies of female spawners.p_G_obs
Iflife_cycle == "LCRchum"
, observed proportion (assumed known without error) of female spawners that are "green", i.e. fully fecund.F_rate
Total harvest rate (proportion) of natural-origin fish. Required for all models, even if all values are 0.B_take_obs
Number of adults taken for hatchery broodstock. Required for all models, even if all values are 0.S_add_obs
Ifstan_model == "IPM_LCRchum_pp"
, number of adults translocated into population....
Additional variables to be used as covariates. These can vary spatially and/or temporally.
- age_F
Logical or 0/1 vector of length
N_age
indicating whether each adult age is fully (non)selected by the fishery. The default is all selected. Iflife_cycle == "SSiter"
,N_age
refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_B
Logical or 0/1 vector of length
N_age
indicating whether each adult age is fully (non)selected in broodstock collection. The default is all selected. Iflife_cycle == "SSiter"
,N_age
refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_S_obs
If
stan_model == "IPM_SS_pp"
, a logical or 0/1 integer vector indicating, for each adult age, whether the observed total spawner data includes that age. The default is to treatS_obs
as including spawners of all ages. This option may be useful if certain age classes are not counted. Iflife_cycle == "SSiter"
,N_age
refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_S_eff
If
stan_model == "IPM_SS_pp"
, a logical or 0/1 vector indicating, for each adult age, whether spawners of that age contribute to reproduction. This can be used, e.g., to exclude jacks from the effective breeding population. The default is to include spawners of all ages. Iflife_cycle == "SSiter"
,N_age
refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- conditionGRonMS
If
life_cycle == "SMaS"
, logical indicating whether the Gilbert-Rich age frequenciesn_GRage_obs
infish_data
are conditioned on ocean age. IfFALSE
(the default) the counts are assumed to be sampled randomly from the population. IfTRUE
, it is assumed that the number of spawners of each ocean age (e.g., jacks vs 1-ocean) is arbitrary, but smolt (FW) age is randomly sampled within each ocean age; i.e., in asmolt age x ocean age
contingency table, the cell frequencies are conditioned on the column totals.- fecundity_data
If
life_cycle == "LCRchum"
, data frame with the following columns, representing observations of fecundity with each row corresponding to a female:age_E
Female age in years.E_obs
Observed fecundity.
Value
A named list that is passed to rstan::sampling()
as the data
argument used when fitting salmonIPM models.
Examples
# Simulate data for a multi-population spawner-to-spawner model
set.seed(1234)
N_pop <- 10
N_year <- 20
N <- N_pop*N_year
pars <- list(mu_alpha = 2, sigma_alpha = 0.5, mu_Rmax = 5, sigma_Rmax = 0.5,
rho_alphaRmax = 0.3, rho_R = 0.7, sigma_year_R = 0.5, sigma_R = 0.3,
tau = 0.5, mu_p = c(0.05, 0.55, 0.4), sigma_pop_p = c(0.1, 0.2),
R_pop_p = diag(2), sigma_p = c(0.5, 0.5), R_p = diag(2), S_init_K = 0.7)
fd <- data.frame(pop = rep(1:N_pop, each = N_year), year = rep(1:N_year, N_pop),
A = 1, p_HOS = 0, F_rate = rbeta(N,7,3), B_rate = 0,
n_age_obs = 50, n_HW_obs = 0)
sim_out <- simIPM(pars = pars, fish_data = fd, N_age = 3, max_age = 5)
# Prepare simulated data for Stan
dat <- stan_data("IPM_SS_pp", fish_data = sim_out$sim_dat)