By Emily Leclerc | Waisman Science Writer
At a Glance
- COSIME is a new machine learning algorithm developed at UW-Madison that helps researchers integrate and interpret multiple types of biological data to better understand the molecular mechanisms behind developmental disabilities and neurodegenerative diseases.
- Complex conditions like Alzheimer’s disease, autism, and other neurodevelopmental disabilities need analytical tools that are capable of handling complicated and often interrelated data. Machine learning algorithms present power and important opportunities to help researchers study these conditions.
- Unlike traditional tools, COSIME can analyze two datasets simultaneously, predict disease outcomes, and interpret which cellular components are most influential – offering a clearer picture of how different biological clues interact to shape complex conditions.
A new machine learning algorithm developed at UW-Madison is helping researchers untangle a complex puzzle: the molecular mechanisms behind developmental disabilities and neurodegenerative diseases. COSIME, created by the lab of Waisman investigator Daifeng Wang, PhD, associate professor of biostatistics and medical informatics, and computer sciences, offers a powerful new way to integrate and interpret multiple types of biological data which can give scientists a clearer view of how different cellular clues come together to shape disease outcomes.

Much like playing a game of Clue, where players piece together different types of information to solve a mystery, researchers studying developmental disabilities and neurodegenerative diseases must sift through a wide array of biological clues with each only offering a fragment of the full story. But unlike the tidy logic of a board game, the data in real-world science is vast, varied, and more complex.
Think of all the different types of clues a real detective might use – fingerprints, security footage, eyewitness accounts, and DNA evidence. Each one reveals part of the story, but not the whole picture. Similarly, when studying developmental disabilities and neurodegenerative diseases, studying one type of data at a time, such as gene expression or metabolic information, limits investigators to a narrow view. There’s only so much it can tell you. To understand these conditions more completely, researchers need to be able to look at more than one type of data at a time to uncover how different cellular mechanisms interact and influence one another.
“To understand these conditions comprehensively, we need to uncover their underlying mechanisms and identify biomarkers, especially those that can aid in early-stage detection,” says Jerome Choi, PhD, former member of the Wang lab, senior data scientist at pharmaceutical company AbbVie Inc., and first author on the paper outlining the new machine learning tool. “Many of these conditions are intricate and not centered on one specific factor and only being able to analyze one dataset at a time often lacks the ability to capture the interplay between different cellular components.”
The challenge however is that all these datasets come in different forms and scales – just like how visual footage, verbal testimony, and molecular evidence are differ in a detective’s investigation. Gene expression data looks very different from cell data or metabolic information. And of course, the datasets are typically large, complex, and nonlinear.
Traditional machine learning algorithms and analysis tools struggle to or are not capable of working with multiple large complicated datasets to tease out patterns, relationships, and their influence over one another. So, Wang and his lab team built a machine learning algorithm that is capable of this.
Their new tool COSIME, short for Cooperative Multi-view Integration with Scalable and Interpretable Model Explainer, is capable of working with two different datasets at a time to predict condition or disease outcomes and then interpret which features and interactions are driving those predictions. The paper outlining how COSIME works and how effective it is was recently published in the journal Nature Machine Intelligence, a premiere journal for machine learning and AI advancements.
“Our new machine learning approach is us trying to understand how different features or cellular mechanisms from different cell types work together to contribute to phenotypes or disease outcomes,” Wang says. “These conditions are complicated, right? They can be attributed to different types of molecular or cellular mechanisms so this work aims to understand how the different mechanisms are working together.”

Essentially COSIME is capable of taking two different types of clues and predicting what happened and which pieces of evidence were important in making that predication. COSIME’s unique two-part structure and ability to interpret the prediction results make it a particularly powerful analytical tool for researchers. “Not many machine learning tools are suitable for interpreting results,” Choi says. “Predicting outcomes is one thing, but we need to know what those predictions are saying.”
COSIME’s two parts can also be used separately or together. While there are other machine learning models – COSIME’s first component that calculates predictions is a type of machine learning model – that are able to predict outcomes, COSIME is notably nimble and able to handle the dense and intricate biological datasets generated from studying developmental disabilities and neurodegenerative diseases. It utilizes a unique blend of machine learning tools, with some of those tools having been built specifically for COSIME by the Wang lab.
If a researcher already has a trained model, they can enter it into COSIME’s second component – which interprets the first part’s predictions – to get pairwise interactions and feature importance information. COSIME is the first machine learning model capable of determining pairwise interaction values. It analyzes how two pieces of information, either inside one dataset or across both datasets, is impacting the model’s predictions. This information is capable of capturing subtle patterns or influences that may otherwise be missed.
“Pairwise interactions mean two features may interact to affect outcomes. For example, gene A from cell type X and gene B from cell type Y,” Wang says. “There could be scenarios where genes A and B are not important on their own but their interactions play key roles.” It would be like finding two different fingerprints that alone don’t mean much but together could indicate who is involved in the detective’s mystery.

Feature importance is the team’s term for the information COSIME generates on what parts of the dataset are important and influential to the disease outcomes. This looks beyond just pairwise interactions. In the detective’s world it would equate to COSIME figuring out which sets of fingerprints were important, their relationship to the collected DNA evidence, and how that illustrates what happened.
In benchmark tests, COSIME outperformed other algorithms and demonstrated high accuracy and efficiency on both simulated and real-world datasets. Its unique construction and specially-built components allow it to more deftly interrogate data than previous algorithms, giving researchers a powerful tool for studying developmental disabilities, neurodegenerative diseases and more.
So far, COSIME has only been applied to Alzheimer’s disease datasets and is only capable of handling two different datasets at a time. However, the algorithm is not disease specific and was intentionally designed for general use. Wang, Choi, and the team built COSIME to be broadly usable and adaptable enough to analyze data from across conditions and diseases. They hope to potentially expand its capabilities to handle three datasets simultaneously but that could prove to be a very complex undertaking.
Just as any good detective needs access to as much information as possible to solve a mystery, the more researchers can reveal about the molecular picture behind developmental disabilities and neurodegenerative diseases, the more effective therapies and treatments they can design. COSIME has the potential to greatly improve researchers’ capacity to study and understand these diverse and complex conditions.
J.J.C. was supported by National Institutes of Health (NIH) grants under funding codes R21NS128761, RF1MH128695, R01AG067025, and P50HD105353, all awarded to D.W.; the National Science Foundation CAREER Award 2144475, also to D.W.; the Wisconsin Distinguished Graduate Fellowship (WDGF) from the Waisman Center; and an NIH grant under funding code RF1AG054047 awarded to C.D.E. N.C.K. and T.G. were supported by the same NIH and NSF grants awarded to D.W. (R21NS128761, RF1MH128695, R01AG067025, P50HD105353, and NSF CAREER Award 2144475). T.L. was supported by NIH grant RF1AG054047 awarded to C.D.E.
Your support makes a difference. Donate now to advance knowledge about human development, developmental disabilities, and neurodegenerative diseases through research, services, training, and community outreach. | DONATE NOW |