Harvesting Bioenergy from Switchgrass


Recently, scientists have made progress on finding the key genetic elements responsible for controlling lignin production in swtichgrass though monitoring of mRNA transcripts. This discovery brings switchgrass one step closer to being used as a source of bioethanol. See full story at Scientists Turn Genetic Keys To Unlock Bioenergy In Switchgrass.

RNA Psuedoknots and the Universcale


I am in a hurry today, so this is going to be brief.

I was checking out the latest over at DIGG, and came across this cool flash applet make by Nikon Japan called Universcale that takes you from picometers to billion of light years in scale, showing objects (albeit, I expected a little bit more quality here) from neutrons to the edge of the universe.

Also, I’m reporting on a new discovery in the realm of viral research. Researchers at the Niels Bohr institute used optical tweezers to grab the ends of an RNA molecule produced by a bird flu virus. They found that the viral-encoded RNA has psuedoknots that cause human RNA polymerase to create the wrong protein (for human), but right for the virus! Check out more at the Science Daily article.

Scientists Rejoice as Google Solves Data Management Crisis


Okay … so maybe “rejoice” is going a little too far, but scientists in the astronomy world are quite happy with Google’s innovative solution to managing their massive amounts of data received from imaging done in space, whether its infrared, gamma-ray, x-ray, etc…

The processes has been coined “FedExNet” by scientists who have already adopted and are using the new service. So what is this new service? I have highlighted some of the main points from the originating Wired article below:

  • Google acts as both a repository and courier for large data sets
  • Google ships both the PC and array to teams of scientists at various research institutions, which then connect their local servers to the array via an eSATA connection. Once the data transfer is complete, the drives get sent straight back to Mountain View, where the data is copied to Google’s servers for archival purposes. The idea then is that if other scientists around the world needed access to such a large quantity of data, Google would simply reverse the process.
  • Chris DiBona, the open-source program manager at Google, says “We make a copy of [the data], and then we can use the hard drives for something else. They’ll get banged around a little bit too much (to store the data directly on the drives). They’re not intended to be a long-term storage medium — they’re like envelopes to us.”
  • With a set of Google drives, Gorelick (who came up with the FedExNet moniker) can copy his team’s data in about 24 hours or less, something that can make a big difference when the time comes to collaborate with other research groups.

    See full article at Wired: Google’s Next-Gen of Sneakernet

Think of all the separate databases out there that manage genetic information. There are many independently operated bioinformatic databases and if they can all be centralized and indexed in a way that only Google can do, think of the potential implications for the scientific community working to progress the knowledge of DNA, RNA and protein interactions. This might be an essential step working towards the completion of the proteome and transcriptome …

Genetic Goldmine Found by Global Ocean Sampling Expedition


Craig J. Venter has accomplished yet another feat in his conquest to sequence everything under the sun. Venter is best known for leading Celera in their challenge to beat the National Institute of Health (NIH) in a race to sequence the human genome. Since then he has lead numerous sequencing projects including the genetic analysis of New York City’s air [or the Nature publication], searching to discover the minimum genome at his company Synthetic Genomics, and most recently the Sorcerer II Global Ocean Sampling Expedition.

Results from the oceanic voyage that traveled from Halifax, Nova Scotia to the Eastern Tropical Pacific during the two year circumnavigation by the Sorcerer II Expedition have finally been released. The announcement from the J. Craig Venter Institute (JCVI) detailed several publications that were made in PLoS Biology. Highlights of the publication include:

Rusch et al. describe the results of metagenomic analysis of 37 samples taken aboard Sorcerer II during its voyage between Halifax, Nova Scotia and French Polynesia in 2003 to 2004, combined with seven samples collected during the pilot study in the Sargasso Sea. To capture the DNA, scientists onboard the Sorcerer II collected water every 200 nautical miles and then filtered it through progressively smaller filters to collect bacteria and then viruses. The DNA extracted for these publications were from the filter that collects mostly bacteria.

The group analyzed a massive dataset consisting of 7.7 million DNA sequences totaling 6.3 billion base pairs. Following from the Sargasso Sea pilot study, they continued to find a great degree of diversity both within and across the sampling sites. Researchers identified 60 highly abundant ribotypes (roughly equivalent to species) however, the inter-species variation and the variation of organisms within the same environment suggests that while the microbes might be similar at an rRNA level they can differ greatly at a biochemical and genomic level.

Yooseph et al. report on the 6.12 million new proteins uncovered from 7.7 million GOS sequences by using a novel sequence clustering approach. This nearly doubles the number of known proteins. The researchers found that the GOS dataset covered almost all of the known prokaryote (bacterial and archaeal) protein families and that there were 1,700 totally unique large protein families in the GOS dataset, not matching any known families. A surprising number of the new protein families discovered are in viruses. Researchers were also able to match 6,000 previously unmatched sequences in current protein databases to proteins found in the GOS dataset.

Previously, it was thought that different families of kinases were responsible for these types of cell regulation in prokaryotes (bacteria) versus eukaryotes (animals and other non-bacteria). Eukaryote protein kinases (ePK) were most common in eukaryotes, histidine kinases in bacteria. However, in their PloS Biology publication Kennan et al. show that with the scope and diversity of the GOS data that ePK-like kinases (ELKs) are indeed very prevalent in bacteria, in fact, more so than histidine kinases. This finding is even shedding some light on human kinases.

The research team has shown that the ePK is just one family in a diverse superfamily of enzymes that all share a common protein kinase-like (PKL) fold (shape). Using sensitive profile methods, the researchers discovered more than 45,000 kinase sequences from the GOS and other public data sources and grouped these into 20 diverse families, of which ePKs were just one. The GOS data doubles the size of most PKL families and triples the number of known ePK-like kinases (ELK). Many of these families exhibited eukaryote-like structure and function of their proteins and thus the researchers conclude that several of these protein families existed before the divergence of the three domains of life.

For more information, please see the press release at the J. Craig Venter Institute.

The data recovered from this mission is likely to yield a number of findings, and will be the focus of much scientific research from years to come. Kudos to you and your team Dr. Venter, and it was nice seeing you in Toronto last fall!