DNA Seen Through the Eyes of a Coder

With a background in computer programming and an undergraduate degree in molecular genetics, its interesting to see the comparisons at multiple levels – and it looks fairly accurate to me at quick glance. There are some other interesting things that could be covered such as methylation patterns and supercoiled DNA (from a genetics point of view), but hopefully the author will keep updating his page — check it out:
DNA seen through the eyes of a coder.

Bioengineering Gene Expression

A recent article at Sciencedaily called Bioengineers Devise ‘Dimmer Swith’ To Regulate Gene Expression In Mammal Cells discusses new technology being developed that combined a targeted DNA repressor protein, and a custom-designed RNAi strand. The repressor is thought to prevent most transcription, but in the event not all genes are repressed, the RNAi is thought to hunt out those transcripts, and destroy them.

Another chemical called Isopropyl-รข-thiogalactopyranoside acts as a “dimmer” that can block the repressor protein. Thus by altering the amount of this chemical, repressor and RNAi, they can regulate a gene’s expression. Cool.

Metagenomics – Emerging Field

Metagenomics is defined as the study of genomes recovered from environmental samples as opposed to from clonal cultures (wikipedia). The National Research Council says that these new capabilities in genomics will revolutionize understanding of the microbial world.

The Research Council report was requested by several federal agencies interested in the potential of metagenomics and how best to encourage its success. In particular, the committee was asked to recommend promising directions for future studies. It concluded that the most efficient way to boost the field of metagenomics overall would be to establish a Global Metagenomics Initiative that includes a few large-scale, internationally coordinated projects and numerous medium- and small-size studies.

Metagenomics studies begin by extracting DNA from all the microbes living in a particular environmental sample; there could be thousands or even millions of organisms in one sample.

Please see the article at Science Daily for more information.

Chicken McNuggets

Okay, this may not have to do with biotech, but since there was mention of mutations, mutagens and tumorigenic additives, I figured it “could fit”. In any case, I loved McD’s Chicken McNuggets (notice the past tense there…) until I read this article, that got its founding from the book called The Omnivore’s Dilemma by Michael Pollan.

Apparently McDonald’s Chicken McNuggets are composed of 38 ingredients, and are 56% corn! The book goes into each in detail, but I want to concentrate on the genetics aspect of this. Here is a list of chemical additives that are harmful to your DNA:

dimethylpolysiloxene: suspected carcinogen, established mutagen, tumorigen, and reproductive effector. It is also flammable.

tertiary butylhydroquinone (TBHQ): antioxidant derived from petroleum. TBHQ is a form of butane (lighter fluid). FDA limits addition of TBHQ to less than 0.02% percent of the oil per nugget [Suggestion: choose hotdogs over chicken mcnuggets in a spur of the moment eating contest.]

So, what are you eating next time you go out to McDonald’s?

, , , , , , ,

Scientists Rejoice as Google Solves Data Management Crisis

Okay … so maybe “rejoice” is going a little too far, but scientists in the astronomy world are quite happy with Google’s innovative solution to managing their massive amounts of data received from imaging done in space, whether its infrared, gamma-ray, x-ray, etc…

The processes has been coined “FedExNet” by scientists who have already adopted and are using the new service. So what is this new service? I have highlighted some of the main points from the originating Wired article below:

  • Google acts as both a repository and courier for large data sets
  • Google ships both the PC and array to teams of scientists at various research institutions, which then connect their local servers to the array via an eSATA connection. Once the data transfer is complete, the drives get sent straight back to Mountain View, where the data is copied to Google’s servers for archival purposes. The idea then is that if other scientists around the world needed access to such a large quantity of data, Google would simply reverse the process.
  • Chris DiBona, the open-source program manager at Google, says “We make a copy of [the data], and then we can use the hard drives for something else. They’ll get banged around a little bit too much (to store the data directly on the drives). They’re not intended to be a long-term storage medium — they’re like envelopes to us.”
  • With a set of Google drives, Gorelick (who came up with the FedExNet moniker) can copy his team’s data in about 24 hours or less, something that can make a big difference when the time comes to collaborate with other research groups.

    See full article at Wired: Google’s Next-Gen of Sneakernet

Think of all the separate databases out there that manage genetic information. There are many independently operated bioinformatic databases and if they can all be centralized and indexed in a way that only Google can do, think of the potential implications for the scientific community working to progress the knowledge of DNA, RNA and protein interactions. This might be an essential step working towards the completion of the proteome and transcriptome …

Ethics in Personalized Medicine

Today, I want to highlight a great article I found on the ethical issues in personalized medicine, which is centered around pharmacogenetic information (your specific DNA genotype for a number of specific genes). Firstly, if you want to get up to speed on pharmacogenomics, check out the US government-run Human Genome Project Information site that has some quick Q&A on this topic!

There was an article recently published online by Reagan Kelly, that discusses some ethical issues of personalized medicine, please see some excerpts below:

“Protecting patient privacy is one of the most important things that must be done before ordinary people will be willing to take advantage of individualized medical care, and just about everyone agrees that patient’s have a right to keep details about their health private from most people (even if not from, say, their insurance company or in some cases state or local governments). But how far does that right extend? Does it cover a person’s genetic makeup? That is something that undeniably influences health, and a fair amount of information about what diseases a person has or is at risk for can be extracted from genotype and gene expression information like what would be collected for personalized medicine services. How do you keep that information private and what uses are OK? … Additionally, what about the privacy of other family members? Families share genetic information, and by knowing something about their risk, a person also learns about their relatives’ risks.”

“One of the issues of privacy is also directly related to patient autonomy – the right of a patient to choose what happens to them. The question of what uses of a patient’s data are permissible is not exclusively a question of privacy but also one of autonomy. Is it OK to require a person to allow their data to be used for risk profiling or diagnosis as a condition of performing the service for them?”

“Cost, just like with the policy issues last time, is a significant ethical issue as well. Something like 46 million people are without health insurance today, and many more have insurance plans that cover only the most basic things. How can we provide access to personalized medicine to everyone? Is access for everyone a reasonable goal? Is it an attainable one?”

Please see the full article for more details.

Genetic Goldmine Found by Global Ocean Sampling Expedition

Craig J. Venter has accomplished yet another feat in his conquest to sequence everything under the sun. Venter is best known for leading Celera in their challenge to beat the National Institute of Health (NIH) in a race to sequence the human genome. Since then he has lead numerous sequencing projects including the genetic analysis of New York City’s air [or the Nature publication], searching to discover the minimum genome at his company Synthetic Genomics, and most recently the Sorcerer II Global Ocean Sampling Expedition.

Results from the oceanic voyage that traveled from Halifax, Nova Scotia to the Eastern Tropical Pacific during the two year circumnavigation by the Sorcerer II Expedition have finally been released. The announcement from the J. Craig Venter Institute (JCVI) detailed several publications that were made in PLoS Biology. Highlights of the publication include:

Rusch et al. describe the results of metagenomic analysis of 37 samples taken aboard Sorcerer II during its voyage between Halifax, Nova Scotia and French Polynesia in 2003 to 2004, combined with seven samples collected during the pilot study in the Sargasso Sea. To capture the DNA, scientists onboard the Sorcerer II collected water every 200 nautical miles and then filtered it through progressively smaller filters to collect bacteria and then viruses. The DNA extracted for these publications were from the filter that collects mostly bacteria.

The group analyzed a massive dataset consisting of 7.7 million DNA sequences totaling 6.3 billion base pairs. Following from the Sargasso Sea pilot study, they continued to find a great degree of diversity both within and across the sampling sites. Researchers identified 60 highly abundant ribotypes (roughly equivalent to species) however, the inter-species variation and the variation of organisms within the same environment suggests that while the microbes might be similar at an rRNA level they can differ greatly at a biochemical and genomic level.

Yooseph et al. report on the 6.12 million new proteins uncovered from 7.7 million GOS sequences by using a novel sequence clustering approach. This nearly doubles the number of known proteins. The researchers found that the GOS dataset covered almost all of the known prokaryote (bacterial and archaeal) protein families and that there were 1,700 totally unique large protein families in the GOS dataset, not matching any known families. A surprising number of the new protein families discovered are in viruses. Researchers were also able to match 6,000 previously unmatched sequences in current protein databases to proteins found in the GOS dataset.

Previously, it was thought that different families of kinases were responsible for these types of cell regulation in prokaryotes (bacteria) versus eukaryotes (animals and other non-bacteria). Eukaryote protein kinases (ePK) were most common in eukaryotes, histidine kinases in bacteria. However, in their PloS Biology publication Kennan et al. show that with the scope and diversity of the GOS data that ePK-like kinases (ELKs) are indeed very prevalent in bacteria, in fact, more so than histidine kinases. This finding is even shedding some light on human kinases.

The research team has shown that the ePK is just one family in a diverse superfamily of enzymes that all share a common protein kinase-like (PKL) fold (shape). Using sensitive profile methods, the researchers discovered more than 45,000 kinase sequences from the GOS and other public data sources and grouped these into 20 diverse families, of which ePKs were just one. The GOS data doubles the size of most PKL families and triples the number of known ePK-like kinases (ELK). Many of these families exhibited eukaryote-like structure and function of their proteins and thus the researchers conclude that several of these protein families existed before the divergence of the three domains of life.

For more information, please see the press release at the J. Craig Venter Institute.

The data recovered from this mission is likely to yield a number of findings, and will be the focus of much scientific research from years to come. Kudos to you and your team Dr. Venter, and it was nice seeing you in Toronto last fall!