By studying changes in gene expression, researchers learn how cells function at a molecular level, which could help them understand the development of certain diseases. But a human has about 20,000 genes that can affect each other in complex ways, so even knowing which groups of genes to target is an enormously complicated problem. Also, genes work together in modules that regulate each other.

MIT researchers have now developed theoretical foundations for methods that could identify the best way to aggregate genes into related groups so they can efficiently learn the underlying cause-and-effect relationships between many genes. Importantly, this new method accomplishes this using only observational data. This means researchers don't need to perform costly, and sometimes infeasible, interventional experiments to obtain the data needed to infer the underlying causal relationships.

In the long run, this technique could help scientists identify potential gene targets to induce certain behavior in a more accurate and efficient manner, potentially enabling them to develop precise treatments for patients. In genomics, it is very important to understand the mechanism underlying cell states. But cells have a multiscale structure, so the level of summarization is very important, too.

If you figure out the right way to aggregate the observed data, the information you learn about the system should be more interpretable and useful." Jiaqi Zhang, graduate student, an Eric and Wendy Schmidt.