PASADENA, Calif.—Biologists in recent years have identified every individual gene in the genomes of several organisms. While this has been quite an accomplishment in itself, the further goal of figuring out how these genes interact is truly daunting.
The difficulty lies in the fact that two genes can pair up in a gigantic number of ways. If an organism has a genome of 20,000 genes, for example, the total number of pairwise combinations is a staggering total of 200 million possible interactions.
Researchers can indeed perform experiments to see what happens when the two genes interact, but 200 million is an enormous number of experiments, says Weiwei Zhong, a postdoctoral scholar at the California Institute of Technology. "The question is whether we can prioritize which experiments we should do in order to save a lot of time."
To get at this issue, Zhong and her supervising professor, Paul Sternberg, have derived a method of database-mining to make predictions about genetic interactions. In the current issue of the journal Science, they report on a procedure for computationally integrating several sources of data from several organisms to study the tiny worm C. elegans, or nematode, an animal commonly used in biological experiments.
This is possible because various organisms have a large number of genes in common. Humans and nematodes, for example, are similar in 40 percent of their genes. Therefore, a genetic-interaction network provides a faster and better way at determining how certain genes interact. Such a network also provides information about whether anyone has ever done an experiment to determine the interaction of two particular genes in one of several species.
"This process works like a matchmaking service for the genes," says Zhong. "It provides you with candidate matches that most likely will be interacting genes, based upon a number of specified features."
The benefit, she adds, is that biologists do not need to do a huge number of random experiments to verify if two genes indeed interact. Therefore, instead of the experimenter having to run 20,000 experiments to see if two genes randomly chosen from the genome of a 20,000-gene organism interact, they might get by with 10 to 50 experiments.
"The beneft is that you can be through in a month instead of years," says Sternberg. "Also, you can do experiments that are careful and detailed, which may take a day, and still be finished in a month."
To build the computational system, the researchers constructed a "training set" for pairs of nematode gene interactions. The "positives" for genetic interactions were taken from 4,775 known pairwise interactions from nematodes.
By "training" the system, Zhong and Sternberg arrived at a way to rapidly arrive at predictions of whether two genes would interact or not.
According to Sternberg, who is the Morgan Professor of Biology at Caltech, the results show that the data-mining procedure works. Also, the results demonstrate that the federal money spent on sequencing genomes-and the comparatively modest expenditures that have gone toward the improvement of biological data processing-have been dollars well spent.
"This is one of a suite of tools and methods people are coming up with to get more bang for the buck," he says.
In particular, Sternberg and Zhong cite the ongoing WormBase project, now in its sixth year as a database funded by the National Institutes of Health for understanding gene interactions of nematodes. WormBase received $12 million in new funding in 2003, and the project is already leading to new database tools ultimately aimed at promoting knowledge of how genes interrelate.
The new study by Zhong and Sternberg is not directly a product of WormBase, but nevertheless mines data from that and other sources. In fact, the study compiles data from several model organisms to reconstruct a gene-interaction network for the nematode.
Zhong says that the system is not perfect yet, because "false negatives" can still arise if the information is simply not in the database, or if the computer fails to recognize two genes as orthologs (i.e., essentially the same gene). "But it will get better," she adds.
"Choosing how to combine these data is the big deal, not the computational ability of the hardware," says Sternberg. "You can also see how the computer made the call of whether two genes should interact. So it's not a black box, but all transparent; and to biologists, that's really valuable. And finally, it's in the public domain."
Finally, the system provides a good window into the manner in which the biology of the future is emerging, Sternberg says. Zhong, for example, has a doctorate in biology and a master's in computer science: she spends about as much time working on computer databases as she does in the lab with the organisms themselves.
"This is the new generation of biologists," Sternberg says.
The study is titled "Genome-wide Prediction of C. elegans Genetic Interactions," and is published in the March 10 issue of Science.