Once they had found similar genetic riffs in worms, humans, flies, bacteria, and other organisms, the researchers could look at what was known about the function of these clearly important genes and score them accordingly, with a high “knownness” score reflecting solid understanding.
Because so much genetic information is already available on hundreds of genomes and recorded in a standardized way, it was possible to automate this scoring process. “We then asked how many of those [conserved genes] have a score of less than one, where essentially nothing is known about them,” says Freeman. “To our surprise, two decades after the first human genome, it is still an extraordinary number.”
In all, the total number of human genes with a knownness score of 1 or less is currently 1,723 out of 19,664.
By the same token, the top 10 genes identified by the team’s rummage through genetic databases corresponded with “all the most famous genes, which is reassuring,” says Sean Munro of the Laboratory of Molecular Biology in Cambridge, a study coauthor. “We recognized every single one of them, and there are already thousands of papers about each of them.”
When it came to the substantial number that were unknown, the team conducted one more study, using the best understood (at the genetic level) organism of all: Drosophila melanogaster. These fruit flies have been the subject of research for more than a century because they are easy and inexpensive to breed, have a short life cycle, produce lots of young, and can be genetically modified in numerous ways.
The team used gene editing to dial down the use of around 300 low-scoring genes found in both humans and fruit flies. “We found that one-quarter of these unknown genes were lethal—when knocked out, they caused the flies to die, and yet nobody had ever known anything about them,” says Freeman. “Another 25 percent of them caused changes in the flies—phenotypes—that we could detect in many ways.” These genes were linked with fertility, development, locomotion, protein quality control, and resilience to stress. “That so many fundamental genes are not understood was eye-opening,” Freeman says. It’s possible that variation in these genes could have very big impacts on human health.
All of this “unknomics” information is held on a database, which the team is making available for other researchers to use to discover new biology. The next step may be to hand the data on these mystery genes and the mystery proteins they create over to AI.
DeepMind’s AlphaFold, for example, can provide important insights into what mystery proteins do, notably by revealing how they interact with other proteins, says Alex Bateman of the European Bioinformatics Institute, based near Cambridge, UK. So can cryo-EM, which is a way of producing images of large, complex molecules, he says. And a University College London team has shown a systematic way to use machine learning to figure out what proteins do in yeast.
The Unknome is unusual in that it’s a biology database that will shrink as we understand it better. The paper shows that over the past decade “we have moved from 40 percent to 20 percent of the human proteome having a certain level of unknownness,” says Bateman. However, at current progress rates, working out the function of all human protein-coding genes could take more than half a century, Freeman estimates.
Denial of responsibility! My Droll is an automatic aggregator of Global media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.