Meet ARTEMIS: The new AI helping researchers reconstruct the hidden stories of cell development

By Emily Leclerc | Waisman Science Writer

At a Glance:

  1. The lab of Waisman investigator Daifeng Wang has built a new machine learning algorithm named ARTEMIS that can predict missing information in between data collection time points (snapshots).
  2. ARTEMIS takes existing data sets and predicts what the data would be in between each time point in the data.
  3. ARTEMIS can help researchers look at the dynamic and continuous process of cell development, making it easier to find key genes that are driving and influencing development.
  4. ARTEMIS has the potential to make it easier to study the impact of intellectual and developmental disabilities on cell and brain development.

Building a complete developmental trajectory of cells is close to impossible with today’s technology, but new technology developed at the Waisman Center is taking scientists a step closer to achieving it. Machine learning, a form of artificial intelligence, can step in to help fill in the gaps. The lab of Waisman investigator Daifeng Wang, PhD, associate professor of biostatistics and medical informatics, built a new machine learning algorithm that is able to extrapolate continuous changes in cell populations over time to give researchers deeper insight into cellular processes. The algorithm may enable a more nuanced understanding of cellular development possibly revealing how conditions like intellectual and developmental disabilities impact cells and brain development at the molecular level.

Man wearing a red sweater standing in front of windows
Daifeng Wang

Cells are constantly in a state of flux and with current data collection methods, it is essentially impossible to accurately capture a continuous measurement of cellular change over time. “Cells are not static. Their change is dynamic and continuous. But the measures researchers take are static. They’re like a snapshot of that one moment in time,” says Wang. “It’s impossible to track something like gene expression in real time across so many cells and so many genes.”

Researchers can capture a series of snapshots that give an overall idea of how cells are changing but all of the information in between data collection points is lost. This is where machine learning can play a pivotal role in scientific research.

Wang and Sayali Anil Alatkar, graduate student in Wang’s lab and first author on the paper, designed a new machine learning algorithm named ARTEMIS (trAjectory infeRence wiTH unbalancEd dynaMic optImal tranSport) that is capable of taking existing data sets and predicting the missing information in between the data collection time points. The paper outlining the new machine learning algorithm was accepted to the proceedings of the International Conference on Intelligent Systems for Molecular Biology (ISMB) and was published in the journal Oxford Bioinformatics. ISMB is the flagship annual meeting of the International Society for Computational Biology. This conference is highly competitive and Wang and Alatkar’s paper is one of very few that have been accepted across the entirety of UW-Madison over many years.

“Say you originally have gene expression data for 1000 cells at three time points from a particular developmental process,” Wang says. “Using ARTEMIS we can predict or recover the gene expression across all of the time points in between the three in the data set. It outputs continuous dynamic data across the entire process.”

Woman in green jacket standing in front of a blue background
Sayali Anil Alatkar

ARTEMIS combines two different machine learning methods in order to generate its output. The first is a variational autoencoder or VAE. VAEs were designed to reconstruct data. “These autoencoders essentially take huge data sets and create a small, compressed representation that is more easily used for data analysis,” Alatkar says. The second is a Schrodinger bridge. Schrodinger bridges are another type of algorithm that can help predict continuous trajectories from data it is fed. So together, ARTEMIS’ VAE creates a compressed version of a large data set. Then the Schrodinger bridge predicts a continuous trajectory from the compressed version and lastly the VAE reconstructs and decompresses the data. This leaves the user with a predicted continuous look at whichever process was being studied.

“You can think of ARTEMIS as similar to those things that predict what you will look like when you get older. It connects many images of someone’s face, makes simpler representations of them, and then uses those versions to predict what you will look like in the future,” Wang says. “Our version does that with gene expression rather than pictures.”

ARTEMIS was trained and tested on several real-world data sets including pancreatic cancer cell differentiation and the embryonic development of zebrafish and showed it was able to outperform current methods for predicting lost time point data. The data sets ARTEMIS trained on were all gene expression data, but the model can handle a variety of data types. “The model is general,” Wang says. “It can easily be applied to any developmental data such as brain development. We can even apply it to typical brain development versus atypical to see how they compare and differ.”

Woman standing behind a podium with a projector screen of a presentation to her left.
Sayali Anil Alatkar presenting ARTEMIS at the International Conference on Intelligent Systems for Molecular Biology

Wang and Alatkar also see ARTEMIS’ potential for handling a variety of different cell characteristics as well. Gene expression is just one characteristic. They want to explore ARTEMIS’ accuracy at predicting protein expression, chromatin accessible regions, cell population changes, and more. “We want to explore multimodal trajectories too. Where we can see how two different characteristics are evolving at the same time,” Alatkar says.

ARTEMIS’s capabilities give researchers so many new potential opportunities to gain better understandings of cell development across a variety of scenarios. It could help researchers look at changes in gene expression during the development of cells with Down syndrome or reveal what continuous brain development looks like at the molecular level for cells with known genetic causes of autism. It has the potential to be a real asset in the study of intellectual and developmental disabilities, which is generating various time-course datasets.