Abstract
Understanding the molecular mechanisms controlling gene expression and how genetic variations shape phenotypic outcomes is a fundamental pursuit in biology. However, learning regulatory networks directly from genomic data remains a significant challenge. Causal network inference is a powerful tool for elucidating regulatory relationships among molecular phenotypes, such as gene expression or DNA methylation. I will first present results from studying how expression quantitative trait loci (eQTLs) regulate protein-coding genes and long non-coding RNAs in 48 different tissues and cell types from the Genotype-Tissue Expression Consortium. Next, I introduce Mendelian Randomization Genomic Network for trios (MRGNtrio), a novel causal inference framework which improves upon existing causal inference methods by: (i) inferring diverse regulatory relationships for a trio; (ii) accommodating the inclusion of many confounding variables; and (iii) eliminating the need for a large set of statistical dependence tests in inference. I demonstrate the effectiveness of MRGNtrio in learning causal biological networks through simulation and by analyzing eQTLs in human whole blood tissue. Lastly, I present a deep learning framework, Hierarchcial Community Detection (DeepHCD), for community detection in gene regulatory networks. My approach aims to address the challenges of learning large networks by partitioning genes into functionally relevant communities which simplifies the network and enables existing causal methods to operate on smaller sets of identified gene modules. I validate the performance of DeepHCD in simulation against existing, commonly used approaches such as hierarhcial clustering and demonstrate its ability to enhance detection of fine community structure. Finally, I conclude with results from applying DeepHCD to group transcription factors associated with mouse hepatocyte differentiation, where I identify multiple transcription factor groups that act as regulatory units.