Prepare input data for IPMs or run-reconstruction models
stan_data.RdThis function is mostly used internally, but may occasionally be useful for diagnosing
problems (e.g., checking the numeric coding of populations and years or the
replacement values for NAs) or for plotting.
Usage
stan_data(
stan_model = c("IPM_SS_np", "IPM_SSiter_np", "IPM_SS_pp", "IPM_SSiter_pp",
"IPM_SMS_np", "IPM_SMS_pp", "IPM_SMaS_np", "IPM_LCRchum_pp", "RR_SS_np", "RR_SS_pp"),
SR_fun = "BH",
RRS = "none",
ages = NULL,
par_models = NULL,
center = TRUE,
scale = TRUE,
priors = NULL,
fish_data,
age_F = NULL,
age_B = NULL,
age_S_obs = NULL,
age_S_eff = NULL,
conditionGRonMS = FALSE,
fecundity_data = NULL
)Arguments
- stan_model
Character string specifying the salmonIPM model to be fit.
- SR_fun
One of
"exp"(density-independent discrete exponential),"BH"(Beverton-Holt, the default), or"Ricker", indicating which spawner-recruit function to fit. Synonyms"DI","B-H","bh","b-h"and"ricker"are accepted.- RRS
A character string or vector of strings naming parameters of the function specified by
SR_funthat differ between wild- and hatchery-origin spawners, such that the relative reproductive success of hatchery spawners is not equal to 1. Ifpool_pops == TRUE, these should be the names of the population-specific parameters, not their hyper-means. For example, iflife_cycle %in% c("SS","SSiter"), the options are"none"(the default),"alpha","Rmax", orc("alpha","Rmax"). CurrentlyRRSis only implemented forpool_pops == FALSE.- ages
For multi-stage models, a named list giving the ages in years of all fixed-age subadult life stages. This is not needed for
IPM_SMaS_npbecause in that case smolt age structure is provided infish_data.- par_models
Optional list of two-sided formulas of the form
theta ~ x1 + ... + xK, wherethetais a (hyper)parameter or state in the model specified bystan_modelthat accepts covariates (see Details for available model-parameter combinations) andx1 + ... + xKare terms involving variables infish_data. Standard formula syntax such as:and*may be used; seestats::formula().- center
Logical indicating whether the terms in model matrices constructed from
fish_datausing the formulas inpar_modelsshould be centered. It is usually recommended to use the default (TRUE) so the baseline parameter estimate applies when predictors are at their sample means, but in some cases such as factor predictorscenter = FALSEmay be appropriate. If combining categorical and numeric predictors, the latter can be centered and scaled prior to modeling.- scale
Logical indicating whether the model matrices constructed from
fish_datausing the formulas inpar_modelsshould be scaled to have column SDs of 1. Unit-scaling predictors is less critical than centering, but is advisable if variables have scales far from 1.- priors
Optional list of two-sided formulas of the form
theta ~ distribution(params), wherethetais a hyperparameter that can take a user-specified prior anddistribution()is its canonical prior family. Seepriorsfor details on the available parameters in each model and their corresponding families. Any hyperparameters not given explicit priors will use the default values of the priorparams.- fish_data
Data frame where each row corresponds to a unique population
xyear, that includes the followingcolnamesin no particular order except where noted (namely age- and origin-composition data):popRequired factor, numeric, or character population name or ID. Will be coerced to a factor, but it is recommended that this be a factor with concise, informative levels, e.g."Johnson Cr". This is especially true if there are multiple populations, in which caselevels(factor(pop))can be used in interpreting and plotting the posterior draws.yearRequired numeric or integer giving the calendar year corresponding to each observation. Note thatfish_datais not indexed by brood year. For a brood table run reconstruction, seerun_recon().ARequired spawning habitat size (either stream length or area). Will often be time-invariant within a population, but need not be. Habitat size is used internally to convert population size scaling parameters (e.g.,Rmax) from density to abundance, so ifA == 1no rescaling is done and these parameters are in units of fish. This is fine ifpool_pops == FALSE, but in hierarchical multi-population models it is advisable to provide habitat size so that the assumption of hierarchical exchangeability is more valid.M_obsIflife_cycle %in% c("SMS","SMaS","LCRchum"), the observed number of wild-origin smolts (integerornumeric). Missing / unknown observations are coded asNA.tau_M_obsIflife_cycle == "LCRchum", known lognormal observation error SDs for smolt abundance. Missing values (NA) will be imputed.downstream_trapIflife_cycle == "LCRchum", row indices corresponding to a downstream smolt trap in a different population whose catch additionally includes the smolts produced in one or more upstream populations, assuming no extra mortality en route. Each upstream population can have at most one downstream trap (in addition to its own, if any) but a trap can have multiple upstream populations. Ifdownstream_trap[i] == j, thenM_downstream[j] = M[j] + M[i]. Ifis.na(downstream_trap[i])thenM[i]is not double-counted.n_Mage[min_Mage]_obs...n_Mage[max_Mage]_obsIflife_cycle == "SMaS", multiple columns of observed smolt age sample frequencies (counts), where[min_Mage]and[max_Mage]are the numeral age in years of the youngest and oldest smolts, respectively. Note that age is measured in calendar years from the brood year (i.e., the Gilbert-Rich system).S_obsRequired observed total escapement of all wild and hatchery-origin spawners (integerornumeric). Missing / unknown observations are coded asNA.tau_S_obsIflife_cycle == "LCRchum", known lognormal observation error SDs for spawner abundance. Missing values (NA) will be imputed.n_age[min_age]_obs ... n_age[max_age]_obsInteger columns of observed spawner age sample frequencies (counts), where[min_age]and[max_age]are the numeral age in years (total, not ocean age) of the youngest and oldest spawners. Required unlesslife_cycle %in% c("SSiter","SMaS"), in which case it is ignored. Note thatn_age_obsand all other compositional data types must not containNA. If the sample included no individuals of a given category or if no samples were collected, the observed frequency is 0.n_age[min_age]M_obs ... n_age[max_age]M_obs n_age[min_age + 1]K_obs ... n_age[max_age + 1]K_obs, Iflife_cycle == "SSiter", integer columns of observed first-time (maiden) and repeat (former kelt) spawner age frequencies where[min_age]and[max_age]are the total age in years of the youngest and oldest maiden spawners, respectively. Contiguous maiden age columns denoted byMare followed by an equal number of contiguous repeat age columns denoted byK, where each repeat age is 1 year greater than the corresponding maiden age. The maximum repeat age class is a plus-group, i.e. it includes all repeat spawners agemax_age + 1or older.n_MSage[min_MSage]_obs ... n_MSage[max_MSage]_obsIflife_cycle == "SMaS", integer columns of observed spawner ocean age sample frequencies, where[min_MSage]and[max_MSage]are the youngest and oldest ocean age in years, respectively. Nonzero ocean age frequencies are only required ifconditionGRonMS == TRUE(the columns must be present in any case). IfconditionGRonMS == FALSE, thenn_MSage_obsrepresents independent samples, not simply the (implicit) ocean-age marginal totals ofn_GRage_obs.n_GRage[min_age]_[min_Mage]_obs ... n_GRage[max_age]_[max_Mage]_obsIflife_cycle == "SMaS", integer columns of observed Gilbert-Rich age sample frequencies, varying fastest by smolt age (min_Mage:max_Mage) and then by total age (min_age:max_age). For example, a life history with subyearling or yearling smolts and ocean ages 2:3 would have column namesc("n_GRage_3_1_obs", "n_GRage_4_1_obs", "n_GRage_4_2_obs", "n_GRage_5_2_obs"). All combinations of smolt age and (implicitly) ocean age must be represented, even if some were never observed.n_W_obsRequired integer observed sample frequencies of natural-origin spawners.n_H_obsRequired integer observed sample frequencies of hatchery-origin spawners.fit_p_HOSLogical or 0/1 indicating for each rowiinfish_datawhether the model should estimatep_HOS[i] > 0. Required ifmodel == "IPM"unlesslife_cycle == "LCRchum".stan_data()will give a warning if any rowimeets either of two conditions:as.logical(fit_p_HOS[i]) == FALSEbutn_W_obs[i] + n_H_obs[i] > 0, oras.logical(fit_p_HOS[i]) == TRUEbutn_W_obs[i] + n_H_obs[i] == 0. The first means HOR were observed, so not accounting for them risks biasing the estimated parameters and states (aka "masking"). The second means the model is being asked to estimatep_HOS[i]with no case-specific hatchery / wild origin-frequency data. Becausep_HOS[i]is an a priori independent parameter (a "fixed effect"), this is a difficult task. There may be some shared information via the process model to indirectly inform it, but this is likely to lead to poor estimates and sampling problems.n_O0_obs n_O[which_O_pop[1]]_obs ... n_O[which_O_pop[N_O_pop]]_obsIflife_cycle = "LCRchum", multiple columns of observed origin sample frequencies. The first column, named "O" for origin and "0" for null / naught, refers to unknown natural origin, i.e. unmarked spawners presumed to be NOR. The nextN_O_popcolumns are numbered by the levels offactor(fish_data$pop)corresponding to the set of known-origin populations. Typically these are hatcheries, but NOR may be identified by PIT tags, parentage-based tagging, or other means. TheLCRchummodel uses origin-composition observations to infer the dispersal rates of hatchery (or other known-origin) fish, son_W_obs(the same asn_O0_obsassuming all known origins are hatcheries) andn_H_obs(equal tosum(n_O_obs[-1])in that case) are not needed, although they can be included infish_datafor informational purposes or for post-processing draws. Likewisefit_p_HOSis not needed and will be ignored.n_M_obsIflife_cycle == "LCRchum", integer observed frequencies of male spawners.n_F_obsIflife_cycle == "LCRchum", integer observed frequencies of female spawners.p_G_obsIflife_cycle == "LCRchum", observed proportion (assumed known without error) of female spawners that are "green", i.e. fully fecund.F_rateTotal harvest rate (proportion) of natural-origin fish. Required for all models, even if all values are 0.B_take_obsNumber of adults taken for hatchery broodstock. Required for all models, even if all values are 0.S_add_obsIfstan_model == "IPM_LCRchum_pp", number of adults translocated into population....Additional variables to be used as covariates. These can vary spatially and/or temporally.
- age_F
Logical or 0/1 vector of length
N_ageindicating whether each adult age is fully (non)selected by the fishery. The default is all selected. Iflife_cycle == "SSiter",N_agerefers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_B
Logical or 0/1 vector of length
N_ageindicating whether each adult age is fully (non)selected in broodstock collection. The default is all selected. Iflife_cycle == "SSiter",N_agerefers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_S_obs
If
stan_model == "IPM_SS_pp", a logical or 0/1 integer vector indicating, for each adult age, whether the observed total spawner data includes that age. The default is to treatS_obsas including spawners of all ages. This option may be useful if certain age classes are not counted. Iflife_cycle == "SSiter",N_agerefers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- age_S_eff
If
stan_model == "IPM_SS_pp", a logical or 0/1 vector indicating, for each adult age, whether spawners of that age contribute to reproduction. This can be used, e.g., to exclude jacks from the effective breeding population. The default is to include spawners of all ages. Iflife_cycle == "SSiter",N_agerefers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).- conditionGRonMS
If
life_cycle == "SMaS", logical indicating whether the Gilbert-Rich age frequenciesn_GRage_obsinfish_dataare conditioned on ocean age. IfFALSE(the default) the counts are assumed to be sampled randomly from the population. IfTRUE, it is assumed that the number of spawners of each ocean age (e.g., jacks vs 1-ocean) is arbitrary, but smolt (FW) age is randomly sampled within each ocean age; i.e., in asmolt age x ocean agecontingency table, the cell frequencies are conditioned on the column totals.- fecundity_data
If
life_cycle == "LCRchum", data frame with the following columns, representing observations of fecundity with each row corresponding to a female:age_EFemale age in years.E_obsObserved fecundity.
Value
A named list that is passed to rstan::sampling() as the data
argument used when fitting salmonIPM models.
Examples
# Simulate data for a multi-population spawner-to-spawner model
set.seed(1234)
N_pop <- 10
N_year <- 20
N <- N_pop*N_year
pars <- list(mu_alpha = 2, sigma_alpha = 0.5, mu_Rmax = 5, sigma_Rmax = 0.5,
rho_alphaRmax = 0.3, rho_R = 0.7, sigma_year_R = 0.5, sigma_R = 0.3,
tau = 0.5, mu_p = c(0.05, 0.55, 0.4), sigma_pop_p = c(0.1, 0.2),
R_pop_p = diag(2), sigma_p = c(0.5, 0.5), R_p = diag(2), S_init_K = 0.7)
fd <- data.frame(pop = rep(1:N_pop, each = N_year), year = rep(1:N_year, N_pop),
A = 1, p_HOS = 0, F_rate = rbeta(N,7,3), B_rate = 0,
n_age_obs = 50, n_HW_obs = 0)
sim_out <- simIPM(pars = pars, fish_data = fd, N_age = 3, max_age = 5)
# Prepare simulated data for Stan
dat <- stan_data("IPM_SS_pp", fish_data = sim_out$sim_dat)