New researcher uses machine learning to decode genomic information

Recent advances in genome sciences — the study of an organism’s complete set of DNA — present a golden opportunity to identify the genetic causes and underlying mechanisms of intellectual and developmental disabilities. These discoveries will also improve diagnosis and potential treatments for these conditions. To advance science in this innovative field, the Waisman Center successfully competed in a campus-wide initiative and won an opportunity to hire three new tenure-track faculty members for a Functional Genetics/Genomics of Neurodevelopmental and Neurodegenerative Diseases Cluster. The interdisciplinary team will include a biostatistician, a neuroscientist, and a geneticist. The cluster will serve as a nucleus to integrate research, training, and clinical services in this up-and-coming research area at the Waisman Center.

Daifeng Wang, PhD
Daifeng Wang, PhD

Daifeng Wang, PhD, a bioinformatician, was the first hire and fulfills the biostatistician role in the cluster. He began working at the Waisman Center in July 2019. His research focuses on the use of machine learning to analyze large scale genomic data to better understand gene regulation and functions in the brain. Machine learning is a form of artificial intelligence (AI) that uses computer programs to analyze data quickly and efficiently. “It’s really difficult to process so much complex data manually, so we have to use computer programs to analyze the data and generate predictions that could relate to a disease,” Wang says. He hopes to apply these machine learning tools to a broad range of Waisman research focused on intellectual and developmental di  sabilities and neurodegenerative diseases. “The ultimate goal is to achieve some kind of precision medicine,” he says. ‘Precision medicine’ is a term used to describe any kind of personalized treatment that is produced for a very targeted group of patients — or even one specific patient.

To achieve this, Wang employs deep neural network, a newer method of machine learning that uses computer programming to recognize patterns in large amounts of data. Wang describes what he’s doing as “decoding a black box,” referring to an electronic device that wires a series of inputs (in this case, DNA mutations) and outputs (diseases). “Everything connects to everything in a layered structure in the black box,” he says. “We try to make the black box biologically interpretable by using prior biological knowledge to reveal the ‘genetic’ wires within the layered structure.”

The focus of Wang’s current research is how genomic variants, like DNA mutations, affect gene expression and regulation. “Gene regulation is a very important mechanism,” he says. “Genes don’t act alone. To ensure proper biological function, they need to interact with each
other.”

Decoding Genomic Information to Better Understand Molecular Mechanisms and Improve Disease Diagnosis

Wang uses available genomic data to test whether certain traits or a combination of traits will lead to a specific disease. If enough of the same traits produce the same diseases, then he can make meaningful connections to pathology. He says there is currently so much genomic data available — which could be used to predict neurodegenerative, developmental, and psychiatric disorders — that integrating all of this data can be a challenge. Wang’s innovative use of machine learning makes that integration possible.

Wang has a PhD in electrical and computer engineering from the University of Texas at Austin and did his postdoctoral training at Yale University. He was drawn to the cluster hire position because of the Waisman Center’s emphasis on interdisciplinary translational research. He looks forward to collaborating with others on IDD genomics.