Sparse and Compositionally Robust Inference of Microbial Ecological Networks
Fig 1
Conditional independence vs correlation analysis for a toy dataset.
In an ecosystem, the abundance of any OTU is potentially dependent on the abundances of other OTUs in the ecological network. Here, we simulate abundances from a network where OTU 3 directly influences (via some set of biological mechanisms) the abundances of OTUs 1, 2 and 4 (a). The inference goal here is to recover the underlying network from the simulated data. b) Absolute abundances of these four OTUs were drawn from a negative-binomial distribution across 500 samples according to the true network (as described in the Methods section). c) Computing all pairwise Pearson correlation yields a symmetric matrix showing patterns of association (positive correlations are green and negative are red). We thresholded entries of the correlation matrix to generate relevance networks. d) A threshold at ρ ≥ ∣0.35∣ (represented by dashed and solid edges) results in a network in which OTU 3 is connected to all other OTUs with an additional connection between OTU 2 and OTU 4. A more stringent threshold at ρ ≥ ∣0.5∣, results in a sparser relevance network (notably missing the edge between OTU 3 and OTU 1), and is represented in d by solid edges only. Importantly, no single threshold recovers the true underlying hub topology. e) The inverse sample covariance matrix yields a symmetric matrix where entries are approximately zero if the corresponding OTU pairs are conditionally independent. The network (f) inferred from the non-zero entries (colored in blue in e) identifies the correct hub network. Thus, it is possible to choose a threshold for the sample inverse covariance that faithfully recovers the true network. Such a threshold is not guaranteed to exist for correlation or covariance (the metric used by SparCC and CCREPE). Intuitively, this is because simultaneous direct connections can induce strong correlations between nodes that do not have direct relationships (e.g. OTU 2-4). Conversely, weak correlations can arise between directly connected nodes (e.g. OTU 1-3). Although correlation is a useful measure of association in many contexts, it is a pairwise metric and therefore limited in a multivariate setting. On the other hand, SPIEC-EASI’s estimate of entries in the inverse covariance matrix depend on the conditional states of all available nodes. This feature helps SPIEC-EASI avoid detection of indirect network interactions.