Abstract
Graphical models or networks describe the statistical dependence among
multiple variables and are widely used in biology (e.g., gene regulatory
networks). Under appropriate assumptions, directed edges may represent causal
relationships. A key feature of a biological network is sparsity, defined by
how likely an edge is present, of which we often have some knowledge. However,
most existing Bayesian methods use priors for the entire graph, making it
difficult to specify the level of sparsity. The few methods that use priors on
edges estimate the two directions independently; the sum of the two
probabilities can exceed 1. Here, we present baycn (BAYesian Causal Network), a
novel approximate Bayesian method that represents a graph in terms of three
states of edges: the two directions and edge absence, and specifies priors on
these edge states. We design a pseudo Bayesian sampling algorithm for efficient
inference. We apply baycn to two genomic problems: i) distinguishing direct and
indirect target genes of genetic variants, using these variants as instrumental
variables, and ii) inferring combinatorial binding of highly-correlated
transcription factors in Drosophila. In both cases and in extensive
simulations, our method demonstrates much improved accuracy over existing
methods for the whole graph and for individual edges.