Skip to contents

This function is mostly used internally, but may occasionally be useful for diagnosing problems (e.g., checking the numeric coding of populations and years or the replacement values for NAs) or for plotting.

Usage

stan_data(
  stan_model = c("IPM_SS_np", "IPM_SSiter_np", "IPM_SS_pp", "IPM_SSiter_pp",
    "IPM_SMS_np", "IPM_SMS_pp", "IPM_SMaS_np", "IPM_LCRchum_pp", "RR_SS_np", "RR_SS_pp"),
  SR_fun = "BH",
  RRS = "none",
  ages = NULL,
  par_models = NULL,
  center = TRUE,
  scale = TRUE,
  priors = NULL,
  fish_data,
  age_F = NULL,
  age_B = NULL,
  age_S_obs = NULL,
  age_S_eff = NULL,
  conditionGRonMS = FALSE,
  fecundity_data = NULL
)

Arguments

stan_model

Character string specifying the salmonIPM model to be fit.

SR_fun

One of "exp" (density-independent discrete exponential), "BH" (Beverton-Holt, the default), or "Ricker", indicating which spawner-recruit function to fit. Synonyms "DI", "B-H", "bh", "b-h" and "ricker" are accepted.

RRS

A character string or vector of strings naming parameters of the function specified by SR_fun that differ between wild- and hatchery-origin spawners, such that the relative reproductive success of hatchery spawners is not equal to 1. If pool_pops == TRUE, these should be the names of the population-specific parameters, not their hyper-means. For example, if life_cycle %in% c("SS","SSiter"), the options are "none" (the default), "alpha", "Rmax", or c("alpha","Rmax"). Currently RRS is only implemented for pool_pops == FALSE.

ages

For multi-stage models, a named list giving the ages in years of all fixed-age subadult life stages. This is not needed for IPM_SMaS_np because in that case smolt age structure is provided in fish_data.

par_models

Optional list of two-sided formulas of the form theta ~ x1 + ... + xK, where theta is a (hyper)parameter or state in the model specified by stan_model that accepts covariates (see Details for available model-parameter combinations) and x1 + ... + xK are terms involving variables in fish_data. Standard formula syntax such as : and * may be used; see stats::formula().

center

Logical indicating whether the terms in model matrices constructed from fish_data using the formulas in par_models should be centered. It is usually recommended to use the default (TRUE) so the baseline parameter estimate applies when predictors are at their sample means, but in some cases such as factor predictors center = FALSE may be appropriate. If combining categorical and numeric predictors, the latter can be centered and scaled prior to modeling.

scale

Logical indicating whether the model matrices constructed from fish_data using the formulas in par_models should be scaled to have column SDs of 1. Unit-scaling predictors is less critical than centering, but is advisable if variables have scales far from 1.

priors

Optional list of two-sided formulas of the form theta ~ distribution(params), where theta is a hyperparameter that can take a user-specified prior and distribution() is its canonical prior family. See priors for details on the available parameters in each model and their corresponding families. Any hyperparameters not given explicit priors will use the default values of the prior params.

fish_data

Data frame where each row corresponds to a unique population x year, that includes the following colnames in no particular order except where noted (namely age- and origin-composition data):

  • pop Required factor, numeric, or character population name or ID. Will be coerced to a factor, but it is recommended that this be a factor with concise, informative levels, e.g. "Johnson Cr". This is especially true if there are multiple populations, in which case levels(factor(pop)) can be used in interpreting and plotting the posterior draws.

  • year Required numeric or integer giving the calendar year corresponding to each observation. Note that fish_data is not indexed by brood year. For a brood table run reconstruction, see run_recon().

  • A Required spawning habitat size (either stream length or area). Will often be time-invariant within a population, but need not be. Habitat size is used internally to convert population size scaling parameters (e.g., Rmax) from density to abundance, so if A == 1 no rescaling is done and these parameters are in units of fish. This is fine if pool_pops == FALSE, but in hierarchical multi-population models it is advisable to provide habitat size so that the assumption of hierarchical exchangeability is more valid.

  • M_obs If life_cycle %in% c("SMS","SMaS","LCRchum"), the observed number of wild-origin smolts (integer or numeric). Missing / unknown observations are coded as NA.

  • tau_M_obs If life_cycle == "LCRchum", known lognormal observation error SDs for smolt abundance. Missing values (NA) will be imputed.

  • downstream_trap If life_cycle == "LCRchum", row indices corresponding to a downstream smolt trap in a different population whose catch additionally includes the smolts produced in one or more upstream populations, assuming no extra mortality en route. Each upstream population can have at most one downstream trap (in addition to its own, if any) but a trap can have multiple upstream populations. If downstream_trap[i] == j, then M_downstream[j] = M[j] + M[i]. If is.na(downstream_trap[i]) then M[i] is not double-counted.

  • n_Mage[min_Mage]_obs...n_Mage[max_Mage]_obs If life_cycle == "SMaS", multiple columns of observed smolt age sample frequencies (counts), where [min_Mage] and [max_Mage] are the numeral age in years of the youngest and oldest smolts, respectively. Note that age is measured in calendar years from the brood year (i.e., the Gilbert-Rich system).

  • S_obs Required observed total escapement of all wild and hatchery-origin spawners (integer or numeric). Missing / unknown observations are coded as NA.

  • tau_S_obs If life_cycle == "LCRchum", known lognormal observation error SDs for spawner abundance. Missing values (NA) will be imputed.

  • n_age[min_age]_obs ... n_age[max_age]_obs Integer columns of observed spawner age sample frequencies (counts), where [min_age] and [max_age] are the numeral age in years (total, not ocean age) of the youngest and oldest spawners. Required unless life_cycle %in% c("SSiter","SMaS"), in which case it is ignored. Note that n_age_obs and all other compositional data types must not contain NA. If the sample included no individuals of a given category or if no samples were collected, the observed frequency is 0.

  • n_age[min_age]M_obs ... n_age[max_age]M_obs n_age[min_age + 1]K_obs ... n_age[max_age + 1]K_obs, If life_cycle == "SSiter", integer columns of observed first-time (maiden) and repeat (former kelt) spawner age frequencies where [min_age] and [max_age] are the total age in years of the youngest and oldest maiden spawners, respectively. Contiguous maiden age columns denoted by M are followed by an equal number of contiguous repeat age columns denoted by K, where each repeat age is 1 year greater than the corresponding maiden age. The maximum repeat age class is a plus-group, i.e. it includes all repeat spawners age max_age + 1 or older.

  • n_MSage[min_MSage]_obs ... n_MSage[max_MSage]_obs If life_cycle == "SMaS", integer columns of observed spawner ocean age sample frequencies, where [min_MSage] and [max_MSage] are the youngest and oldest ocean age in years, respectively. Nonzero ocean age frequencies are only required if conditionGRonMS == TRUE (the columns must be present in any case). If conditionGRonMS == FALSE, then n_MSage_obs represents independent samples, not simply the (implicit) ocean-age marginal totals of n_GRage_obs.

  • n_GRage[min_age]_[min_Mage]_obs ... n_GRage[max_age]_[max_Mage]_obs If life_cycle == "SMaS", integer columns of observed Gilbert-Rich age sample frequencies, varying fastest by smolt age (min_Mage:max_Mage) and then by total age (min_age:max_age). For example, a life history with subyearling or yearling smolts and ocean ages 2:3 would have column names c("n_GRage_3_1_obs", "n_GRage_4_1_obs", "n_GRage_4_2_obs", "n_GRage_5_2_obs"). All combinations of smolt age and (implicitly) ocean age must be represented, even if some were never observed.

  • n_W_obs Required integer observed sample frequencies of natural-origin spawners.

  • n_H_obs Required integer observed sample frequencies of hatchery-origin spawners.

  • fit_p_HOS Logical or 0/1 indicating for each row i in fish_data whether the model should estimate p_HOS[i] > 0. Required if model == "IPM" unless life_cycle == "LCRchum". stan_data() will give a warning if any row i meets either of two conditions: as.logical(fit_p_HOS[i]) == FALSE but n_W_obs[i] + n_H_obs[i] > 0, or as.logical(fit_p_HOS[i]) == TRUE but n_W_obs[i] + n_H_obs[i] == 0. The first means HOR were observed, so not accounting for them risks biasing the estimated parameters and states (aka "masking"). The second means the model is being asked to estimate p_HOS[i] with no case-specific hatchery / wild origin-frequency data. Because p_HOS[i] is an a priori independent parameter (a "fixed effect"), this is a difficult task. There may be some shared information via the process model to indirectly inform it, but this is likely to lead to poor estimates and sampling problems.

  • n_O0_obs n_O[which_O_pop[1]]_obs ... n_O[which_O_pop[N_O_pop]]_obs If life_cycle = "LCRchum", multiple columns of observed origin sample frequencies. The first column, named "O" for origin and "0" for null / naught, refers to unknown natural origin, i.e. unmarked spawners presumed to be NOR. The next N_O_pop columns are numbered by the levels of factor(fish_data$pop) corresponding to the set of known-origin populations. Typically these are hatcheries, but NOR may be identified by PIT tags, parentage-based tagging, or other means. The LCRchum model uses origin-composition observations to infer the dispersal rates of hatchery (or other known-origin) fish, so n_W_obs (the same as n_O0_obs assuming all known origins are hatcheries) and n_H_obs (equal to sum(n_O_obs[-1]) in that case) are not needed, although they can be included in fish_data for informational purposes or for post-processing draws. Likewise fit_p_HOS is not needed and will be ignored.

  • n_M_obs If life_cycle == "LCRchum", integer observed frequencies of male spawners.

  • n_F_obs If life_cycle == "LCRchum", integer observed frequencies of female spawners.

  • p_G_obs If life_cycle == "LCRchum", observed proportion (assumed known without error) of female spawners that are "green", i.e. fully fecund.

  • F_rate Total harvest rate (proportion) of natural-origin fish. Required for all models, even if all values are 0.

  • B_take_obs Number of adults taken for hatchery broodstock. Required for all models, even if all values are 0.

  • S_add_obs If stan_model == "IPM_LCRchum_pp", number of adults translocated into population.

  • ... Additional variables to be used as covariates. These can vary spatially and/or temporally.

age_F

Logical or 0/1 vector of length N_age indicating whether each adult age is fully (non)selected by the fishery. The default is all selected. If life_cycle == "SSiter", N_age refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).

age_B

Logical or 0/1 vector of length N_age indicating whether each adult age is fully (non)selected in broodstock collection. The default is all selected. If life_cycle == "SSiter", N_age refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).

age_S_obs

If stan_model == "IPM_SS_pp", a logical or 0/1 integer vector indicating, for each adult age, whether the observed total spawner data includes that age. The default is to treat S_obs as including spawners of all ages. This option may be useful if certain age classes are not counted. If life_cycle == "SSiter", N_age refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).

age_S_eff

If stan_model == "IPM_SS_pp", a logical or 0/1 vector indicating, for each adult age, whether spawners of that age contribute to reproduction. This can be used, e.g., to exclude jacks from the effective breeding population. The default is to include spawners of all ages. If life_cycle == "SSiter", N_age refers to the total number of maiden and repeat age classes (counting the repeat plus group as 1).

conditionGRonMS

If life_cycle == "SMaS", logical indicating whether the Gilbert-Rich age frequencies n_GRage_obs in fish_data are conditioned on ocean age. If FALSE (the default) the counts are assumed to be sampled randomly from the population. If TRUE, it is assumed that the number of spawners of each ocean age (e.g., jacks vs 1-ocean) is arbitrary, but smolt (FW) age is randomly sampled within each ocean age; i.e., in a smolt age x ocean age contingency table, the cell frequencies are conditioned on the column totals.

fecundity_data

If life_cycle == "LCRchum", data frame with the following columns, representing observations of fecundity with each row corresponding to a female:

  • age_E Female age in years.

  • E_obs Observed fecundity.

Value

A named list that is passed to rstan::sampling() as the data argument used when fitting salmonIPM models.

Examples

# Simulate data for a multi-population spawner-to-spawner model
set.seed(1234)
N_pop <- 10
N_year <- 20
N <- N_pop*N_year

pars <- list(mu_alpha = 2, sigma_alpha = 0.5, mu_Rmax = 5, sigma_Rmax = 0.5, 
             rho_alphaRmax = 0.3, rho_R = 0.7, sigma_year_R = 0.5, sigma_R = 0.3, 
             tau = 0.5, mu_p = c(0.05, 0.55, 0.4), sigma_pop_p = c(0.1, 0.2), 
             R_pop_p = diag(2), sigma_p = c(0.5, 0.5), R_p = diag(2), S_init_K = 0.7)

fd <- data.frame(pop = rep(1:N_pop, each = N_year), year = rep(1:N_year, N_pop),
                 A = 1, p_HOS = 0, F_rate = rbeta(N,7,3), B_rate = 0,
                 n_age_obs = 50, n_HW_obs = 0)

sim_out <- simIPM(pars = pars, fish_data = fd, N_age = 3, max_age = 5)

# Prepare simulated data for Stan
dat <- stan_data("IPM_SS_pp", fish_data = sim_out$sim_dat)