Essential Gene Clusters

I just came across a very interesting article that talks about essential gene clusters and some speculation from a lab out of the Baylor College of Medicine.

Scientists say they found a cluster of essential genes on mouse chromosome 11, which is also found to be conserved in other organisms including humans, possum, cow, dog and chimp.

“When we saw that there were all these essential genes in this region, we wondered if the reason that the chromosome remained together (and is not easily broken apart or recombined with other parts of this or other chromosomes) is that it had all these densely packed essential genes. The reason this part of the chromosome has remained intact is that it has densely packed essential genes. If the chromosome broke anywhere, the organism would not develop,” said Dr. Monica Justice, the associate professor of molecular and human genetics at Baylor.

View full scientific publication at PLoS Genetics.
View full review article at

Genetic Goldmine Found by Global Ocean Sampling Expedition

Craig J. Venter has accomplished yet another feat in his conquest to sequence everything under the sun. Venter is best known for leading Celera in their challenge to beat the National Institute of Health (NIH) in a race to sequence the human genome. Since then he has lead numerous sequencing projects including the genetic analysis of New York City’s air [or the Nature publication], searching to discover the minimum genome at his company Synthetic Genomics, and most recently the Sorcerer II Global Ocean Sampling Expedition.

Results from the oceanic voyage that traveled from Halifax, Nova Scotia to the Eastern Tropical Pacific during the two year circumnavigation by the Sorcerer II Expedition have finally been released. The announcement from the J. Craig Venter Institute (JCVI) detailed several publications that were made in PLoS Biology. Highlights of the publication include:

Rusch et al. describe the results of metagenomic analysis of 37 samples taken aboard Sorcerer II during its voyage between Halifax, Nova Scotia and French Polynesia in 2003 to 2004, combined with seven samples collected during the pilot study in the Sargasso Sea. To capture the DNA, scientists onboard the Sorcerer II collected water every 200 nautical miles and then filtered it through progressively smaller filters to collect bacteria and then viruses. The DNA extracted for these publications were from the filter that collects mostly bacteria.

The group analyzed a massive dataset consisting of 7.7 million DNA sequences totaling 6.3 billion base pairs. Following from the Sargasso Sea pilot study, they continued to find a great degree of diversity both within and across the sampling sites. Researchers identified 60 highly abundant ribotypes (roughly equivalent to species) however, the inter-species variation and the variation of organisms within the same environment suggests that while the microbes might be similar at an rRNA level they can differ greatly at a biochemical and genomic level.

Yooseph et al. report on the 6.12 million new proteins uncovered from 7.7 million GOS sequences by using a novel sequence clustering approach. This nearly doubles the number of known proteins. The researchers found that the GOS dataset covered almost all of the known prokaryote (bacterial and archaeal) protein families and that there were 1,700 totally unique large protein families in the GOS dataset, not matching any known families. A surprising number of the new protein families discovered are in viruses. Researchers were also able to match 6,000 previously unmatched sequences in current protein databases to proteins found in the GOS dataset.

Previously, it was thought that different families of kinases were responsible for these types of cell regulation in prokaryotes (bacteria) versus eukaryotes (animals and other non-bacteria). Eukaryote protein kinases (ePK) were most common in eukaryotes, histidine kinases in bacteria. However, in their PloS Biology publication Kennan et al. show that with the scope and diversity of the GOS data that ePK-like kinases (ELKs) are indeed very prevalent in bacteria, in fact, more so than histidine kinases. This finding is even shedding some light on human kinases.

The research team has shown that the ePK is just one family in a diverse superfamily of enzymes that all share a common protein kinase-like (PKL) fold (shape). Using sensitive profile methods, the researchers discovered more than 45,000 kinase sequences from the GOS and other public data sources and grouped these into 20 diverse families, of which ePKs were just one. The GOS data doubles the size of most PKL families and triples the number of known ePK-like kinases (ELK). Many of these families exhibited eukaryote-like structure and function of their proteins and thus the researchers conclude that several of these protein families existed before the divergence of the three domains of life.

For more information, please see the press release at the J. Craig Venter Institute.

The data recovered from this mission is likely to yield a number of findings, and will be the focus of much scientific research from years to come. Kudos to you and your team Dr. Venter, and it was nice seeing you in Toronto last fall!