A causal theory for studying the cause-and-effect relationships of genes

By studying changes in gene expression, researchers learn how cells function at a molecular level, which could help them understand the development of certain diseases.

But a human has about 20,000 genes that can affect each other in complex ways, so even knowing which groups of genes to target is an enormously complicated problem. Also, genes work together in modules that regulate each other.

MIT researchers have now developed theoretical foundations for methods that could identify the best way to aggregate genes into related groups so they can efficiently learn the underlying cause-and-effect relationships between many genes.

Importantly, this new method accomplishes this using only observational data. This means researchers donâ€™t need to perform costly, and sometimes infeasible, interventional experiments to obtain the data needed to infer the underlying causal relationships.

In the long run, this technique could help scientists identify potential gene targets to induce certain behavior in a more accurate and efficient manner, potentially enabling them to develop precise treatments for patients.

â€œIn genomics, it is very important to understand the mechanism underlying cell states. But cells have a multiscale structure, so the level of summarization is very important, too. If you figure out the right way to aggregate the observed data, the information you learn about the system should be more interpretable and useful,â€ says graduate student Jiaqi Zhang, an Eric and Wendy Schmidt Center Fellow and co-lead author of a paper on this technique.

Zhang is joined on the paper by co-lead author Ryan Welch, currently a masterâ€™s student in engineering; and senior author Caroline Uhler, a professor in the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Data, Systems, and Society (IDSS) who is also director of the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, and a researcher at MITâ€™s Laboratory for Information and Decision Systems (LIDS). The research will be presented at the Conference on Neural Information Processing Systems.

Learning from observational data

The problem the researchers set out to tackle involves learning programs of genes. These programs describe which genes function together to regulate other genes in a biological process, such as cell development or differentiation.

Since scientists canâ€™t efficiently study how all 20,000 genes interact, they use a technique called causal disentanglement to learn how to combine related groups of genes into a representation that allows them to efficiently explore cause-and-effect relationships.

In previous work, the researchers demonstrated how this could be done effectively in the presence of interventional data, which are data obtained by perturbing variables in the network.

But it is often expensive to conduct interventional experiments, and there are some scenarios where such experiments are either unethical or the technology is not good enough for the intervention to succeed.

With only observational data, researchers canâ€™t compare genes before and after an intervention to learn how groups of genes function together.

â€œMost research in causal disentanglement assumes access to interventions, so it was unclear how much information you can disentangle with just observational data,â€ Zhang says.

The MIT researchers developed a more general approach that uses a machine-learning algorithm to effectively identify and aggregate groups of observed variables, e.g., genes, using only observational data.

They can use this technique to identify causal modules and reconstruct an accurate underlying representation of the cause-and-effect mechanism. â€œWhile this research was motivated by the problem of elucidating cellular programs, we first had to develop novel causal theory to understand what could and could not be learned from observational data. With this theory in hand, in future work we can apply our understanding to genetic data and identify gene modules as well as their regulatory relationships,â€ Uhler says.

A layerwise representation

Using statistical techniques, the researchers can compute a mathematical function known as the variance for the Jacobian of each variableâ€™s score. Causal variables that donâ€™t affect any subsequent variables should have a variance of zero.

The researchers reconstruct the representation in a layer-by-layer structure, starting by removing the variables in the bottom layer that have a variance of zero. Then they work backward, layer-by-layer, removing the variables with zero variance to determine which variables, or groups of genes, are connected.

â€œIdentifying the variances that are zero quickly becomes a combinatorial objective that is pretty hard to solve, so deriving an efficient algorithm that could solve it was a major challenge,â€ Zhang says.

In the end, their method outputs an abstracted representation of the observed data with layers of interconnected variables that accurately summarizes the underlying cause-and-effect structure.

Each variable represents an aggregated group of genes that function together, and the relationship between two variables represents how one group of genes regulates another. Their method effectively captures all the information used in determining each layer of variables.

After proving that their technique was theoretically sound, the researchers conducted simulations to show that the algorithm can efficiently disentangle meaningful causal representations using only observational data.

In the future, the researchers want to apply this technique in real-world genetics applications. They also want to explore how their method could provide additional insights in situations where some interventional data are available, or help scientists understand how to design effective genetic interventions. In the future, this method could help researchers more efficiently determine which genes function together in the same program, which could help identify drugs that could target those genes to treat certain diseases.

This research is funded, in part, by the MIT-IBM Watson AI Lab and the U.S. Office of Naval Research.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

“Touch Grass without touching grass” with these hilarious (and very real) skins for Xbox, Steam Deck, laptop, phone, and more

Microsoft Teams will fix meeting chats for presenters with this small change

ChatGPT’s stunning new image generator is now free for everyone

Everything coming to Call of Duty: Black Ops 6 multiplayer with Season 3

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Image Dimension Validation with Laravel’s dimensions Rule

“Touch Grass without touching grass” with these hilarious (and very real) skins for Xbox, Steam Deck, laptop, phone, and more

“Touch Grass without touching grass” with these hilarious (and very real) skins for Xbox, Steam Deck, laptop, phone, and more

Microsoft Teams will fix meeting chats for presenters with this small change

Everything coming to Call of Duty: Black Ops 6 multiplayer with Season 3

A causal theory for studying the cause-and-effect relationships of genes

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

Config 2024: How we redesigned Figma

State Actor Made Three Attempts to Breach B.C. Government Networks

sxcs – minimal X11 color picker and magnifier

How to auto scroll Appium Server Console log at bottom

This chemical in your home is destroying your productivity – 3 ways I beat it (and you can too)

How AI is Transforming Quality Engineering in 2025

HP shows off three next-gen laptops and revamps it desktops at CES 2025

What the First State of HTML Survey Taught Us

A causal theory for studying the cause-and-effect relationships of genes

Related Posts