Chapter 2 IPM Construction

2.1 Open population model with 2 age classes

2.1.1 Model description

Population dynamics are represented using a female-based age-structured open population model with a pre-breeding census in spring. At census, females are divided into two age classes: “yearlings” (1-year old birds hatched during the breeding season of the previous year) and “adults” (birds older than one year). The motivation underlying this distinction is that reproductive output often differs for these two age classes in passerine birds. The dynamics of the female segment of the population over the time-interval from census in year \(t\) to census in year \(t+1\) can be described with classic matrix notation (Caswell 2001) as:

\[N_{tot,t+1} = \begin{bmatrix} N_{Y,t+1} \\ N_{A,t+1} \end{bmatrix} = \begin{bmatrix} 0.5F_{Y,t}sJ_t & 0.5F_{A,t}sJ_t \\ sA_t & sA_t \end{bmatrix}\begin{bmatrix} N_{Y,t} \\ N_{A,t} \end{bmatrix} + \begin{bmatrix} Imm_{Y,t+1} \\ Imm_{A,t+1} \end{bmatrix}\]

\(N_{tot,t+1}\) represents the total number of yearling and adult females in the population upon arrival in the breeding areas in year t. The total female population size, \(N_{tot,t+1}\), is the sum of the numbers of yearling and adult females in the population in year \(t+1\) (\(N_{Y,t+1}\) and \(N_{A,t+1}\), respectively) and consists of local survivors and recruits from the previous breeding season, as well as immigrant yearling (\(Imm_{Y,t+1}\)) and adult (\(Imm_{A,t+1}\)) females.

\(F_{a,t}\) represents the expected number of fledglings produced by age class \(a\) females during the breeding season in year \(t\) and is the product of several vital rates. First, females in age class a may breed in a nestbox with probability \(pB_{a,t}\) upon arrival to breeding areas in year \(t\). Each breeding female may then lay a clutch containing a certain number of eggs (expected number = \(CS_{a,t}\)), and each egg within the clutch may hatch and survive to fledging. The probability of an egg hatching and surviving to fledging is divided into an age-independent probability of nest success (\(pNS_t\), probability of complete clutch failure = \(1-pNS_t\)) and a survival probability of every egg/chick to fledging provided that the nest has not failed entirely (\(sN_{a,t}\), with \(a\) = age of the mother). Consequently, the expected number of fledglings produced by age class \(a\) females in year t is defined as:

\[\begin{equation} F_{a,t}= pB_{a,t}\times CS_{a,t}\times pNS_{t}\times sN_{a,t} \end{equation}\]

Fledglings that survive to the next breeding season and remain within the population (apparent survival probability = \(sJ_t\)) contribute to next year’s yearling class (\(N_{Y,t+1}\)). Yearlings and adults that survive to the next breeding season and remain within the population (apparent survival probability = \(sA_t\)) become part of next year’s adult age class (\(N_{A,t+1}\)).

2.1.2 Code implementation including demographic stochasticity

Population process models within IPMs are typically implemented as stochastic models that account for randomness in the outcomes of demographic processes at the individual level (“demographic stochasticity”, Caswell 2001; Kéry and Schaub 2011). The model described here is no different, meaning that the numbers of breeders, fledglings, and survivors are treated as binomial and Poisson random variables.

Reproduction is modelled via two sets of random variables: a binomial random variable representing the number of breeders in age class \(a\) in year \(t\), \(B_{a,t}\) and a Poisson random variable representing the number of fledlings produced by breeders of age class \(a\) in year \(t\), \(Juv_{a,t}\) . The implementation in the BUGS language used in the SPI-IPM code (IPMSetup.R, lines 231-241) looks like:

for (t in 1:Tmax){
  for(a in 1:A){
    
    ## 1) Breeding decision
    B[a,t] ~ dbin(pB[a,t], N[a,t])
    
    ## 2) Offspring production
    Juv[a,t] ~ dpois(B[a,t]*CS[a,t]*pNS[t]*sN[a,t]*0.5)
  }
}

In the code, the age indeces \(a=1\) and \(a=A=2\) correspond to yearlings and adults, respectively.

Analogous to breeders, the numbers of local survivors – both fledglings surviving their first year and becoming yearlings, and yearlings and adults surviving to the next year – are implemented as binomial random variables (IPMSetup.R, lines 243-255):

for (t in 1:(Tmax-1)){
  
  ## 3) Annual survival of local birds
  # Juveniles -> Yearlings
  localN[1,t+1] ~ dbin(sJ[t], sum(Juv[1:A,t]))
  # Yearlings/Adults -> adults
  localN[2,t+1] ~ dbin(sA[t], sum(N[1:A,t]))
  
  ## 4) Immigration
  for(a in 1:A){
    N[a,t+1] <- localN[a,t+1] + Imm[a,t+1]
  }
}

Immigrant numbers are also treated as outcomes of stochastic processes, and these are detailed in 2.2.5 Immigrant count data likelihood.

2.2 Data likelihoods

IPMs obtain information on the population model’s parameters (population sizes and vital rates) from several different data sets. Information in each data set is channeled into model parameters via one or multiple data likelihoods.
The SPI-IPM contains five data modules consisting of a total of eight data likelihoods: nest count data (one likelihood), clutch size data (two likelihoods), fledgling count data (three likelihoods), mark-recapture data (one likelihood), and immigrant count data (one likelihood). The likelihoods contained in each data module are described in detail in the following sub-chapters. The underlying data sets are introduced in Chapter 1 of the manual.

2.2.1 Nest count data likelihood

The population model defines the true size of the female segment of the breeding population in any year \(t\) via the year- and age-specific number of breeding females:

\[\begin{equation} B_{a,t} \sim Binomial(N_{a,t}, pB_{a,t}) \end{equation}\]

The total size of the female breeding population (the sum of \(B_{a,t}\) over all age classes) in year \(t\) is expected to correspond closely with the observed number of first clutches laid in any year (\(NestCount_t\)). The breeding population count thus contains information about both breeding probability (\(pB_{a,t}\)) and population size (\(N_{a,t}\)) and its likelihood can be defined as:

\[\begin{equation} NestCount_t \sim Poisson((B_{Y,t} + B_{A,t}) \times NS\_Data_t) \end{equation}\]

Here, the year-specific variable \(NS\_Data_t\) is a correction factor introduced for dealing with years in which no nest survey data was collected (\(NestCount_t = 0\) due to lack of sampling). \(NS\_Data_t\) is set to 0 for years without data collection, and to 1 in all other years.

The likelihood is written almost identically in the code, the only difference being the use the sum() function over age classes \(1\) to \(A=2\) when specifying the Poisson distribution (this provides more flexibility for extending the model to more than two age classes):

for(t in 1:Tmax){
  NestCount[t] ~ dpois(sum(B[1:A,t])*NS_Data[t])
}

2.2.2 Clutch size data likelihoods

The counting of incubated eggs provides information about both individual-level clutch size (\(CS_{a,t}\)) and reproductive output at the population level. Consequently, two separate likelihoods can be specified for within the clutch size data module.

Each individual clutch size observation can be treated as the outcome of a Poisson process with an expected value of \(CS_{a,t}\) (where \(a\) is the age of the female that laid the clutch, and \(t\) the year in which the clutch was laid). In the code, the likelihood is formulated for each clutch size observation \(x\) (out of a total of CS_X observations) and includes nested indexing of the expectation using data on female age (CS_FAge) and year (CS_year):

for(x in 1:CS_X){
    ClutchSize[x] ~ dpois(CS[CS_FAge[x], CS_year[x]])
}

Since both year and female age need to be part of the provided data, the individual-level clutch size likelihood can only be used with complete observations, i.e. clutches for which both number of eggs and age of the mother are known (year should always be known).

For the population-level likelihood, on the other hand, data on clutch sizes can be included irrespective of whether the age of the mother is known. The total number of eggs counted in all nests laid in year \(t\) can be described as:

\[\begin{equation} EggNoTot_t \sim Poisson(sum(B_{Y:A,t} \times CS_{Y:A,t}) \times p_t^{EggNo}) \end{equation}\]

Since the product \(B_{a,t} \times CS_{a,t}\) corresponds to all eggs laid in all nests/nestboxes within the study site, another correction factor (\(p_t^{EggNo}\)) is needed to account for the fact that eggs may not have been counted in all surveyed nests with breeding activity in each year. \(p_t^{EggNo}\) (EggNoSP[t] in code) thus contains information of the year-specific proportionof surveyed nests for which the numbers of eggs were counted.

In code, the likelihood for the number of eggs at the population is written as:

for(t in 1:Tmax){

    # Expected "true" egg number (by mother age class)
    EggNo.ex[1:A,t] <- B[1:A,t]*CS[1:A,t]

    # Observed egg number (corrected by data availaility)
    EggNoTot[t] ~ dpois(sum(EggNo.ex[1:A,t])*EggNoSP[t])

}

2.2.3 Fledgling count data likelihoods

Analogous to observations of clutch size, counts of fledglings contain information on reproduction at both individual and population level. Distributions of number of fledglings produced from a clutch are often 0-inflated because incidents of harsh weather, predation, adandonment, etc. may result in complete loss of entire clutches. To account for this, we split data on fledgling numbers and formulated separate likelihoods for the survival of the clutch as a whole (probability of a nest not failing completely, \(pNS_t\)) and for each chick subsequently surviving to fledgling (probability \(sN_{a,t}\)).
Whether or not a clutch succeeded (i.e. at least one chick survived to fledging, anyFledged) was coded using 1 (success) and 0 (failure) and modelled as the outcome of a Bernoulli process with a year-dependent success probability \(pNS_t\):

for(x in 1:F_X){
  anyFledged[x] ~ dbern(pNS[F_year[x]])
}

For the subset of successful clutches, the number of fledglings produced (NoFledged) was modelled as a binomial random variable with each egg laid in the clutch (NoLaid) having a probability of \(sN_{a,t}\) (where \(a\) = age of the mother) to survive and fledge:

for(x in 1:NoF_X){
    NoFledged[x] ~ dbin(sN[NoF_FAge[x], NoF_year[x]], NoLaid[x])
}

At the population level, both processes (nest success and survival to fledging conditional on nest success) were combined into a single Poisson likelihood describing the total number of fledglings counted in the population in a given year \(t\) as the product of the number of breeding females, clutch size, nest success, and survival to fledging:

\[\begin{equation} FledgedTot_t \sim Poisson(sum(B_{Y:A,t}\times CS_{Y:A,t}\times pNS_t\times sN_{Y:A,t}) \times p_t^{Fledged}) \end{equation}\]

As in the clutch size data module (Chapter 2.2.2), a correction factor quantifying the proportion of nests for which data is availeble (\(p_t^{Fledged}\), FledgedSP[t] in code) is used to account for missing records of fledgling numbers.

In the code, calculation of the expected number of fledglings is based on the expected number of eggs as calculated in the clutch size data module (Chapter 2.2.2):

for(t in 1:Tmax){

    # Expected "true" fledgling number (by mother age class)
    Fledged.ex[1:A,t] <- EggNo.ex[1:A,t]*pNS[t]*sN[1:A,t]

    # Observed egg number (corrected by data availaility)
    FledgedTot[t] ~ dpois(sum(Fledged.ex[1:A,t])*FledgedSP[t])
}

2.2.4 Mark-recapture data likelihood

In the SPI-IPM, capture histories of marked birds are analysed using an age-specific Cormack-Jolly-Seber (CJS) model (Cormack 1964; Jolly 1965; Seber 1965), which allows estimation of parameters associated with both annual survival and the (re-)capturing process.
The apparent survival parameters \(sJ_t\) and \(sA_t\) describe the probability of surviving from one breeding season to the next and remaining in the study population for fledglings/juveniles and adult females, respectively. Since field-based sexing fledglings is not possible in the hatch year, first-year survival in the CJS likelihood was set to \(0.5sJ_t\), representing the probability that a fledgling is female and survives (recaptures of adult males were removed from the capture histories).
The time- and age-dependent recapture parameters \(p_{Age,t}^{Recap}\) describe the probability that an age class \(a\) individual is recaptured and identified during the breeding season of year \(t\) given that it survived the time interval \(t-1\) to \(t\). For many of the bird populations included in SPI-Birds, (re-)captures of birds are carried out at the nestboxes during the breeding season only. Consequently, two conditions need to be met for a (re-)capture besides the individual being alive: 1) the individual breeds in a nestbox (as non-breeders and birds breeding in natural cavities are not captured) and 2) the individual is actually captured and marked/identified while breeding in a nestbox. The recapture probability is therefore the product of the probability of breeding in a nestbox (\(pB_{Age,t}\)) and the probability of capture and identification given breeding in a nestbox (\(p_t^{CapB}\)): \[\begin{equation} p_{Age,t}^{Recap}=pB_{Age,t}\times p_t^{CapB} \end{equation}\] The parameters \(pB_{Age,t}\) and \(p_t^{CapB}\) are confounded, and auxiliary data on one of them is required to separately estimate them. Here, raw data on the annual proportions of surveyed nests for which the identity of the breeding female was recorded (as a result of ringing or recapture) is used to approximated \(p_t^{CapB}\), allowing the CJS model to estimate age- and time-specific breeding probabilities (\(pB_{Age,t}\)).
CJS models in Bayesian frameworks have traditionally been implemented either as latent states models with Bernouilli likelihoods or as multinomial likelihoods for data re-formatted as “m-arrays” (Gimenez et al. 2007; Kéry and Schaub 2011). The former typically results in long runtimes and high computational costs, while the improved efficiency of the latter is tied to an implementation that suffers from being neither particularly intuitive nor user-friendly. To avoid these pitfalls, the CJS model in the SPI-IPM is implemented using a much more efficient marginalized likelihood (that integrates over all latent states) that can be applied to unique (not individual) capture histories only (following Turek, de Valpine, and Paciorek 2016):

## Likelihood with custom distribution
for (i in 1:n.CH){

  y.sum[i, first.sum[i]:last.sum[i]] ~ dCJS_vv_sum(
      probSurvive = phi.CH[i, first.sum[i]:last.sum[i]],
      probCapture = p.CH[i, first.sum[i]:last.sum[i]],
      len = last.sum[i]-first.sum[i]+1,
      mult = CHs.count[i])
}

In practice, this involves the use of a custom distribution dCJS_vv_sum, and more information on the likelihood and its implementation using NIMBLE is provided in Chapter 4.1 (Efficient implementation using NIMBLE).

2.2.5 Immigrant count data likelihood

In many nestbox studies, chick ringing is an exhaustive effort and it is not unreasonable to assume that all chicks hatched in nestboxes within a study population are ringed before fledging. Similarly, assuming that females captured in nestboxes are breeding (or attempting to breed) is often not far from the truth. If those two assumptions are largely met, then the annual numbers of newly ringed adult females (\(ImmNoObs_t\)) can provide information about immigration or – more specifically – about the latent number of newly immigrated females breeding in nestboxes each year (\(ImmNoTot_t\)). Typically, age will be unknown for immigrant females (unless they were marked as chicks elsewhere) and hence there is no age index on \(ImmNoObs_t\) and \(ImmNoTot_t\). In SPI-IPM, we use a Poisson observation model to describe the relationship between observed and true numbers of breeding immigrant females: \[\begin{equation} ImmNoObs_t \sim Poisson(ImmNoTot_t \times p_t^{ImmDetect}) \end{equation}\] The expected value represents the true number of female breeding immigrants in year \(t\) (\(ImmNoTot_t\)) corrected by an annual probability of detecting (= ringing) such individuals (\(p_t^{ImmDetect}\)). In most nestbox studies, ringing and recapturing of breeding birds are conducted in tandem as part of the same protocol, and we can consequently approximate adult ringing probability using recapture probability (as used in the likelihood for capture histories): \(p_t^{ImmDetect} = p_t^{CapB}\)). In a few exceptional years in some datasets, however, adult birds may have been ringed, but no recaptures recorded (i.e. \(p_t^{CapB}=0\) but \(p_t^{ImmDetect}>0\)). These cases call for estimation of (or, in other words, accounting for the uncertainty in) the corresponding \(p_t^{ImmDetect}>0\). As long as such cases are relatively few, this can be accommodated by specifying a non-informative prior for the unknown value¹:

## Immigrant detection (= marking) probability
for(t in 1:Tmax){
  PropImmDetect[t] ~ dunif(0, 1)
}

The latent number of breeding immigrants, informed by data as outlined above, is estimated as pooled across age classes (\(ImmNoTot_t\)), but the since the population model in SPI-IPM is age structured \(ImmNoTot_t = ImmB_{Y,t} +ImmB_{A,t}\), where \(ImmB_{Y,t}\) and \(ImmB_{A,t}\) the numbers of breeding yearling and adult immigrants, respectively. These quantities are linked to the total numbers of immigrants in each age class (breeding and non-breeding), \(Imm_{a,t}\), that appear in the age-structured population model (Chapter 2.1 through age-specific breeding probabilities. By assuming equal breeding probabilities for locally recruited and newly immigrated females, we can use the breeding probabilities estimated via the mark-recapture likelihood (\(pB_{a,t}\)) to make the connection: \[\begin{equation} ImmB_{a,t} \sim Binomial(Imm_{a,t}, pB_{a,t}) \end{equation}\]

In the code, the sub-model for immigration contains both the likelihood for the immigrant count data pooled across ages and the relationship between breeding immigrant females and all immigrant females in both age classes:

## Likelihood for the number/age distribution of immigrant females
for (t in 2:Tmax){

  # Latent true number of breeding immigrants (count observation model)
  ImmNoObs[t] ~ dpois(sum(ImmB[1:A,t])*PropImmDetect[t])

  # Number of breeding immigrants per age class
  for(a in 1:A){
    ImmB[a,t] ~ dbin(pB[a,t], Imm[a,t])
  }
}

ImmNoObs[1] <- 0
ImmB[1:A,1] <- 0

Note that various immigrant numbers are set to 0 at \(t=1\) since there is no way of distinguishing between locally-born and newly immigrated birds in the first year of study.

2.3 Priors and constraints

In the model description above, all vital rate parameters, as well as initial population size and immigrant numbers, appear as fully time- and age-dependent parameters (i.e. indexed by \(t\) = year and \(a\) = age class). SPI-IPM is a hierarchical model, meaning it also describes how demographic quantities vary across time and age classes and requires priors for the parameters defining these underlying processes. The priors in the model’s standard implementation are all non- or only weakly informative, but they can easily be replaced with more informative priors if required (see e.g. Chapter 8.2).

Time- and age-dependence in all vital rates is expressed using generalized mixed-effect models. The exact model structure for each vital rate, including link function, is described in detail in Chapter 3. Irrespective of the vital rate, the basic model parameters fall into three categories: (age-specific) intercepts representing time-average vital rates (\(\mu\)), slope parameters for the effects of temporal covariates (\(\beta\)), and standard deviations describing the distribution of random year effects (\(\sigma\)). The intercept parameters \(\mu\) are defined on the natural scale to allow for more intuitive setting of priors. As such, non-informative Uniform(0,1) priors can be used for the \(\mu\) of all vital rates representing probabilities (\(\mu_a^{pB}\), \(\mu^{pNS}\), \(\mu_a^{sN}\), \(\mu^{sJ}\), and \(\mu^{sA}\)). For the age-specific average clutch size (\(\mu_a^{CS}\)), we use a Uniform(0,10) prior. This latter prior is not completely uninformative, as it sets an upper limit of 10 eggs for the average clutch size (this is reasonable for pied flycatchers for which SPI-IPM was initially developed, but may have to be adjusted for species that lay larger clutches). All slope parameters \(\beta\) are given Uniform(-5,5) priors, and all \(\sigma\) parameters (except for the one linked to immigration, \(\sigma_a^{Imm}\)) are given Uniform(0,10) priors.

Priors and constraints are also required for the initial population size, i.e. \(N_{Y,1}\) and \(N_{A,1}\) since the population process model only begins predicting \(N\) from the second year (\(t=2\)) onward. Immigrant numbers have their own set of priors (see below), and since \(N_{a,t} = localN_{a,t} + Imm_{a,t}\), priors have to be set for \(localN_{a,1}\) specifically. SPI-IPM uses what is essentially a discrete uniform prior for initial local population size:

## Initial population sizes

for(a in 1:A){
  localN[a,1] ~ dcat(DU.prior[1:N1.limit])
  N[a,1] <- localN[a,1] + Imm[a,1]
}

DU.prior[1:N1.limit] <- 1/N1.limit

Here, N1.limit is a user-specified maximum possible number of local females in any age class at the first time-step. The discrete uniform distribution is approximated using a categorical distribution in which each integer number between 1 and N1.limit has the same probability (1/N1.limit) of occurring. This approach is equivalent to using a regular Uniform(0,N1.limit) distribution and subsequently rounding it.

The final set of priors and constraints in the model are for immigrant numbers \(Imm_{Y,t}\) and \(Imm_{A,t}\). At the first time-step (\(t=1\)), immigrant numbers are set to 0 (since there is no way of distinguishing previously local and newly immigrated individuals in the first year of a study). Numbers at any subsequent time-step are assumed to follow a 0-truncated normal distribution with age-specific mean \(AvgImm_a\) (AvgImm[a]) and standard deviation \(\sigma_a^{Imm}\) (sigma.Imm[a])):

## Latent number of all immigrants (breeding & non-breeding) per age class
Imm[1:A,1] <- 0

for(t in 2:Tmax){
  for(a in 1:A){

    ImmCont[a,t] ~ T(dnorm(AvgImm[a], sd = sigma.Imm[a]), 0, Inf)
    Imm[a,t] <- round(ImmCont[a,t])
  }
}

The rounding function ensures that immigrant numbers are integers. Priors are then required for \(AvgImm_a\) and \(\sigma_a^{Imm}\), and these are given as uniform distributions with values ranging from 0 to (a multiple of) a user-specified maximum possible number of immigrant females in any age class:

for(a in 1:A){
  AvgImm[a] ~ dunif(0, AvgImm.limit)
  sigma.Imm[a] ~ dunif(0, AvgImm.limit*10)
}

References

Caswell, H. 2001. Matrix population models: construction, analysis, and interpretation. Sunderland, Mass.: Sinauer Associates.

Cormack, Richard M. 1964. “Estimates of Survival from the Sighting of Marked Animals.” Biometrika 51 (3/4): 429–38.

Gimenez, Olivier, Vivien Rossi, Rémi Choquet, Camille Dehais, Blaise Doris, Hubert Varella, Jean-Pierre Vila, and Roger Pradel. 2007. “State-Space Modelling of Data on Marked Individuals.” Ecological Modelling 206 (3-4): 431–38.

Jolly, George M. 1965. “Explicit Estimates from Capture-Recapture Data with Both Death and Immigration-Stochastic Model.” Biometrika 52 (1/2): 225–47.

Kéry, Marc, and Michael Schaub. 2011. Bayesian Population Analysis Using Winbugs: A Hierarchical Perspective. Academic Press.

Seber, George AF. 1965. “A Note on the Multiple-Recapture Census.” Biometrika 52 (1/2): 249–59.

Turek, Daniel, Perry de Valpine, and Christopher J Paciorek. 2016. “Efficient Markov Chain Monte Carlo Sampling for Hierarchical Hidden Markov Models.” Environmental and Ecological Statistics 23 (4): 549–64.

For estimating only a subset of values within a vector/array (= partially observed variables), data (here PropImmDetect[t]) is provided including NA for unknown values and initial values are provided for the corresponding indeces (with initial values matching indeces of known data points being set to NA). Handling partially observed variables is discussed some more under 3.2.3 Imputation of missing covariate values ↩︎