A new computational pipeline connects disease and discovery at the cellular level

Could Alzheimer’s disease and schizophrenia be biologically connected?  A new computational model, scGRNom (single-cell Gene Regulatory Network prediction from multi- omics), developed by researchers at the University of Wisconsin-Madison, harnesses the power of multi-omics and machine learning. The model provides insight at the cellular and molecular level of genetic links and risks between certain neurodegenerative diseases and neuropsychiatric disorders. The findings were recently published in the journal Genome Medicine.

Daifeng Wang, PhD
Daifeng Wang, PhD

Currently, detection of many brain diseases such as Alzheimer’s disease, a neurodegenerative disease that impacts memory and cognition or schizophrenia, a neuropsychiatric disorder, is based on behavioral observation or brain imaging data. “No one has really been able to understand what’s going on at the cellular and molecular level, how the genes work together and how they work for a particular cell type,” says Daifeng Wang, PhD. Wang is the study’s senior author who is an assistant professor of biostatics and informatics and a Waisman investigator. “We wanted to examine multiple diseases at the cellular level to see if there was any overlap in genomic data. Are there points of intersection in gene regulation that cause an over- or under-expression of a gene that could lead to disease or affect a disease’s symptoms, progression or severity? Could we predict any of these factors with this data?”

Enter scGRNom, a computational pipeline that combines multi-omic data (cellular and molecular data about an organism’s DNA, genes and gene functions) and machine learning – the core part of artificial intelligence that uses computers to analyze large amounts of data and make prediction quickly and efficiently—to advance our understanding of brain diseases.

A road map for millions of cells working together.

Cells with similar functions can be grouped as a cell type such as neurons, microglia and oligodendrocytes. The cell types are really important as intermediate phenotypes for disease. “No one has tried to understand how the genes work together for the particular cell types or if they work abnormally for a particular cell type that can lead to disease.” says Wang. “In order to understand how the genes work together, we have to integrate multiple aspects or multi-omics. This is what this new computational model can do.”

scGRNom integrates and analyzes this multi-omic data, to identify commonalities that could lead to predictors and indicators of disease progression, severity, and phenotype. Better understanding of these diseases and disorders may ultimately lead to interventions and treatments. The multi-omic data is provided at different aspects including genomics, epigenomics, transcriptomics. For example, genome-wide associated studies (GWAS)—studies scan genomes from many different people and identify genetic risk markers that associate with the presence of a disease. Recent single cell sequencing techniques provide the data of epigenomics and transcriptomics at cellular resolution in the human brain. Also, population data such as the ROSMAP cohort provide bulk-tissue transcriptomics data enabling disease phenotype prediction from genes. ROSMAP includes data from individuals who are part of the Religious Orders Study (ROS) and the Memory and Aging Project (MAP) enrolled by the Alzheimer’s Disease Center at Rush University.

Wang compares how scGRNom works as a traffic app that helps identify and navigate traffic jams and road problems. “There are programs that analyze the traffic situation and tell you where there is heavy traffic and where there might be an accident or stopped traffic. Based on data and analysis the traffic app can pinpoint the location of the accident and highlight some of the impacts of the accident on traffic flow,” he says. “The new scGRNom computational model operates very similarly for genomic data related to a disease. By applying the computational model to the multi-omic data, it helps identify the gene and genetic variants that contribute to the problem for particular cell types and then measure that gene activity and use this to predict disease likelihood.”

The connections between Alzheimer’s and schizophrenia.

Many diseases, like Alzheimer’s and schizophrenia, have genetic origins related to the over- or under-expression of a gene. Gene activities, such as expression, are governed by a gene regulatory network (GRN) to facilitate the correct molecular functions on the genome-scale. Disrupted cooperation can give rise to abnormal gene expression that has been found in disease.  Recent GWAS identified a number of genetic risk variants associated with multiple brain diseases.

In the study, researchers used the scGRNom computational pipeline to focus on the gene regulatory network and target genes in Alzheimer’s and schizophrenia to find possible connections. “We used these two examples because studies indicate that there are shared risk factors between the two diseases,” says Wang. “For example, general psychosis has been found in up to 60% of individuals with Alzheimer’s as well as other effects mirroring symptoms found in schizophrenia.  Both diseases are significantly associated with genetic variants and have complex underlying gene regulatory mechanisms.”

In using the scGRNom model to examine GWAS they were able to make linkages between Alzheimer’s and schizophrenia and found cross-disease genomic functions at the cellular level. “GWAS is the association between a genetic variant and disease that indicates a DNA mutation,” says Wang. “The multi-omic data analysis with scGRNom tells you how the variants affect the genes and how the affected genes work abnormally for particular cell types and how that leads to disease. In the case of Alzheimer’s and schizophrenia, scGRNom takes the multi-omic data for each disease and compares them looking for overlaps and intersections.

Story Figure 1
Fig. 1: Prediction accuracy of AD clinical phenotypes from disease genes. The AD population data for prediction was from the ROSMAP cohort. AD clinical phenotypes include Braak—stages that measure the severity of neurofibrillary tangle (NFT) pathology; Cerad—scores that measure neuritic plaques; Cogdx—cognitive status at the time of death; and Dcfdx—the diagnosis of cognitive status. The bar height represents the average accuracy of cross-validation (K = 5) from the prediction using logistic regression (“Methods” section). Red: scGRNom’s cell-type disease genes shared by AD and SCZ (SCZ-AD genes). Green: AD genes from GWAS. Blue: randomly select genes (same number as SCZ-AD genes)

The cell-type disease genes shared by Alzheimer’s and schizophrenia were found to help improve predicting disease progression and cognitive impairment in Alzheimer’s disease. See figure 1.

scGRNom is a general-purpose tool that is open-sourced and available at https://github.com/daifengwanglab/scGRNom. Wang hopes to do future studies using the multi-omic computational model to analyze and compare other cell types and brain disorders such as neurodevelopmental disorders and intellectual disabilities like autism and Down syndrome.

This study was funded by grants from the National Institutes of Health, R01AG067025, R21CA237955, and U01MH116492 to D.W., U54HD090256 to Waisman Center, and the start-up funding for Daifeng Wang from the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin–Madison.

The students from Wang lab, Ting Jin, Peter Rehani, Mufang Ying jointly led the analyses as co-first-author. Jiawei Huang, Shuang Liu, and Panagiotis Roussos also contributed to the work.