Statistica Sinica 29 (2019), 1127-1154
Abstract: Species distribution models usually attempt to explain the presence-absence or abundance of a species at a site in terms of the environmental features (so-called abiotic features) present at the site. Historically, such models have considered species individually. However, it is well established that species interact to influence the presence–absence and abundance (envisioned as biotic factors). As a result, recently joint species distribution models with various types of responses, such as presence–absence, continuous, and ordinal data have attracted a significant amount of interest. Such models incorporate the dependence between species’ responses as a proxy for interaction. We address the accommodation of such modeling in the context of a large number of species (e.g., order 102) across sites numbering in the order of 102 or 103 when, in practice, only a few species are found at any observed site. To do so, we adopt a dimension-reduction approach. The novelty of our approach is that we add spatial dependence. That is, we consider a collection of sites over a relatively small spatial region. As such, we anticipate that the species distribution at a given site will be similar to that at a nearby site. Specifically, we handle dimension reduction using Dirichlet processes, which enables the clustering of species, and add spatial dependence across sites using Gaussian processes. We use simulated data and a plant communities data set for the Cape Floristic Region (CFR) of South Africa to demonstrate our approach. The latter consists of presence-absence measurements for 639 tree species at 662 locations. These two examples demonstrate the improved predictive performance of our method using the aforementioned specification.
Key words and phrases: Dimension reduction; Gaussian processes; high-dimensional covariance matrix; spatial factor model; species dependence.