Title: Multiview learning for understanding functional multiomics
Legend: (A) Factorization-based single-view learning methods. They typically factorize a data matrix X from single view (e.g., gene expression matrix of samples by genes) into a product of matrix G (coefficient matrix) and matrix (dictionary matrix or pattern matrix). Because matrix factorization has an intrinsic clustering property, the matrix can represent a clustering structure of the view (i.e., the soft clustering assignments or indicators). For example, reveals 3 different gene clusters, a, b, and c, as denoted in the figure. (B) Factorization-based multiview learning methods. They factorize different matrices from multiomics, e.g., gene expression X(1) (i.e., green matrix), protein expression X(2) (i.e., blue matrix), and chromatin accessibility X(3) (i.e., orange matrix), into a product of different coefficient matrices G(k)(k = 1,2,3) and the common dictionary matrix . This common representation enables revealing of cross-talk patterns among genes, proteins (more precisely, TFs), and regulatory elements (i.e., enhancers); e.g., a TF binds to a region to regulate a gene’s expression. (C) Alignment-based multiview learning methods. The 3 input omic matrices are projected via functions f(k)(k = 1,2,3) onto spaces where their internal relationships are revealed. These representations of different omics are pairwise coordinated to each other via the term Ωco. For example, the figure demonstrates the pairwise alignments between X(1), X(2) and between X(2), X(3) to reveal cross-talk patterns between TFs and enhancers, and between enhancers and gene expressions. (Alignment between X(1) and X(3) is not shown for making the figure concise.) TF, transcription factor.
Citation: Nguyen ND, Wang D. (2020) Multiview learning for understanding functional multiomics. PLoS Comput Biol. 2020 Apr 2;16(4):e1007677. doi: 10.1371/journal.pcbi.1007677.
Abstract: The molecular mechanisms and functions in complex biological systems currently remain elusive. Recent high-throughput techniques, such as next-generation sequencing, have generated a wide variety of multiomics datasets that enable the identification of biological functions and mechanisms via multiple facets. However, integrating these large-scale multiomics data and discovering functional insights are, nevertheless, challenging tasks. To address these challenges, machine learning has been broadly applied to analyze multiomics. This review introduces multiview learning—an emerging machine learning field—and envisions its potentially powerful applications to multiomics. In particular, multiview learning is more effective than previous integrative methods for learning data’s heterogeneity and revealing cross-talk patterns. Although it has been applied to various contexts, such as computer vision and speech recognition, multiview learning has not yet been widely applied to biological data—specifically, multiomics data. Therefore, this paper firstly reviews recent multiview learning methods and unifies them in a framework called multiview empirical risk minimization (MV-ERM). We further discuss the potential applications of each method to multiomics, including genomics, transcriptomics, and epigenomics, in an aim to discover the functional and mechanistic interpretations across omics. Secondly, we explore possible applications to different biological systems, including human diseases (e.g., brain disorders and cancers), plants, and single-cell analysis, and discuss both the benefits and caveats of using multiview learning to discover the molecular mechanisms and functions of these systems.
About the Lab: The Wang lab focuses on the development of interpretable machine learning approaches and bioinformatics tools for understanding the functional genomics, molecular and cellular mechanisms from genotype to phenotype.