Title: Schizophrenia classification and functional genomic prioritization using genotype and bulk-tissue gene expression data using DeepGAMI
Legend: A. Balanced accuracies from 5-fold cross-validation and B.receiver operating characteristic (ROC) curves of DeepGAMI dual-modality model (dark blue), DeepGAMI single modality model (orange), Lasso (brown), LR (light blue), Random Forest (yellow), SVM (purple), Multilayer perceptron (MLP, pink), Varmole (red), and MOGONET (green) for classifying schizophrenia vs. control individuals on the held-out test samples. C.ROC curves of various methods on cross-cohort SCZ prediction. D. Select examples of prioritized transcription factors, SNPs, target genes (latent features, and functional links (GRNs, eQTLs) for schizophrenia. Purple: known schizophrenia genes. E. Function and pathway enrichments of prioritized schizophrenia SNPs
Citation: Chandrashekar, P. B., Alatkar, S., Wang, J., Hoffman, G. E., He, C., Jin, T., Khullar, S., Bendl, J., Fullard, J. F., Roussos, P., & Wang, D. (2023). DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype-phenotype prediction. Genome medicine, 15(1), 88. https://doi.org/10.1186/s13073-023-01248-6
Abstract: Background – Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models. Method – To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes. Results – We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer’s disease). Conclusion – We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.
Keywords: Alzheimer’s disease; Auxiliary learning; Cell-type gene regulatory networks; Cross-modality imputation; Deep learning; Genotype–phenotype prediction; Prioritizing disease risk variants; Schizophrenia.
Investigator: Daifeng Wang, PhD
About the Lab: The Wang lab focuses on the development of interpretable machine learning approaches and bioinformatics tools for understanding the functional genomics, molecular and cellular mechanisms from genotype to phenotype.