According to a stereotypic view, when working in a museum of natural history, one might expect to be confined to dark, cold and damp rooms filled with up-to-the-ceiling columns of books, dusty old bones and jarred specimens, bent over microscopes, cutting slices to be mounted in resin. And there are still plenty of curators who -- at least somewhat -- match this view. And their knowledge, skills and scientific contributions are without a doubt crucial for our current and growing understanding of organsimal biodiversity.
There are other aspects to biological diversity, and one is variation that occurs at the molecular level. For some years now, large numbers of researchers have been looking at specific chunks of DNA ubiquitous in all organisms on Earth. Until very recently, the technology of DNA sequencing was relatively expensive and few laboratories could afford to sequence beyond small genetic regions (also known as “genes”). As a consequence, a whole body of work has developed with a goal of obtaining short molecular data or “barcodes” for all organisms on Earth. This was soon followed by a “metagenetic” approach called meta-barcoding, the study of whole communities using genetic barcodes amplified from a mixed sample. Metabarcoding has many great advantages: it provides a standardised and replicable approach to measuring molecular diversity; it is based on ubiquitous markers that allow comparison across nearly all forms of life; it is very useful in identifying organisms that display very limited amount of morphological variability; and, it can identify a lineage irrespective of life stage (larva or adult) represented in a sample.
Thanks to recent advances in DNA sequencing technologies, part or full genomes have become accessible to even low-budget labs, initiating the era of genomics. The museum is slowly but steadily integrating genomics into its already wide array of research, with a brand new LAB that has invested in some of the most recent DNA sequencing technologies. But genomics is not only about accumulating gigabytes of DNA sequence data. In fact, the most difficult task faced by genomic studies is trying to make sense of this "big data", organising it into coherent patterns (“genome assembly” in the jargon) and deciphering its meaning (“annotation”). Unlike the common adage “more is better”, in genomics, but also in biology in general, the accumulation of data by and for itself, while of great interest to the technophile engineer-minded person, has not much scientific value. In other words, there is need for a guiding logic behind data gathering. The recent boom in funding for genomic-related work has produced vast troves of data that literarily pollute the scientific literature and databanks. These appear to be the product of short-sighted studies from highly competitive labs that could be more interested in publication ratings than in promoting scientific discovery. Many studies only utilise and share a small portion of the data they generate, which can result in high levels of redundancy, as other groups with similar interests sometimes reproducing the same data. That is too much waste for my taste!
A new promising trend in biology consists in looking at the genomic-level data in a community as a whole (whether an ecosystem, a symbiotic or host-parasite system, etc.). This new discipline is known as metagenomics, and is a scale up from the previously mentioned metabarcoding. While there is consideration among many researchers involved in the monitoring of biodiversity to replace barcoding with genomic-level data, the task is still both too expensive and currently not feasible. Therefore, some new approaches have been proposed where some parts of the genomes are specifically targeted for the community of interest. One such approach is to focus on the cellular organelles that harbour small genomes, such as the mitochondrion and the chloroplast. Doing so also provides a template for perfecting analytic tools to handle metagenomic data that can be extended to full genomes when the latter become available.
The genomic era is still in its infancy, and each year we are presented with cheaper and better sequencing technologies, and the bioinformatics tools needed to handle the data produced, though at a significantly slower pace. The main danger I see looming ahead for the genomics community is that because of the very low price of sequencing compared to data analysis, more and more projects will focus their efforts in producing large quantities of data, simply fishing out some information of interest. We need more strategic thinking to come up with smart, innovative ways of producing the data needed to make the analytic phases of biodiversity studies easier and more efficient.
by Ehsan Kayal, NMNH Postdoctoral Buck Fellow