On this Darwin Day, it is apropos to note that large sets of observations can lead to transformative ideas. If you have ever read Darwin’s On the Origin of Species, which is an enjoyable experience, you understand.
More than 150 years later, sets of observations have become more enormous than ever and “big data” has become a big buzzword. You’re probably most familiar with big data from the private sector, such as business tracking consumers online, or from the perspective of government and security (think: body worn cameras and Wikileaks). And maybe you’ve read about how big data is revolutionizing healthcare, for example, by supporting electronic medical records and leading to personalized medical interventions based on your DNA.
But what about big data and the department of Invertebrate Zoology (IZ)? You might be surprised to learn that even here in IZ (as well as the Smithsonian more broadly), big data has a big role to play! While “big data” specifically refers to the vast quantities of data that our digital society produces every day, and the associated storage requirements, there are several more important dimensions to big data in the biological sciences. For example, a variety of technological advances in recent years have enabled scientists to collect more data and more types of data faster than ever before, and to ever-greater resolution and accuracy. On the other end, to extract useful information from such vast quantities of data, researchers must have advanced means of data manipulation and analysis.
So what does this look like in practice?
IZ’s Sant Chair for Marine Science Nancy Knowlton and postdoctoral fellow Matthieu Leray are both involved in an international collaboration with researchers from Jordan to study the biodiversity in the Gulf of Aqaba. The researchers deploy autonomous reef monitoring systems (ARMS) in the gulf, and then later retrieve the units once marine organisms have colonized their surfaces. Using a process called DNA metabarcoding, the team is classifying the microscopic organisms—many of which are entirely new species. ARMS are also being used in an NSF-funded and SI co-led PIRE project in Indonesian as a platform to train the next generation of biodiversity scientists to address the impact of anthropogenic stressors in marine communities and to understand the assembly of communities in increasingly diverse settings. Lots of organisms means lots of DNA which means lots of data and, of course, lots of scientific discoveries!
With more than 126 million items in its collection, the National Museum of Natural History boasts the largest and most comprehensive natural history specimen repository in the world. Many of these are historical, and the museum, including IZ, is currently in the process of creating electronic records for all of its specimens, even those dating back to the 1838-42 U.S. Exploring Expedition, the Smithsonian’s first voyage. And every time an IZ researcher heads out on a cruise, new specimens are returned to the museum. In addition to supporting incredibly important biodiversity and conservation research, NMNH’s efforts to develop its electronic and digital offerings are also important for you: open access for the public promotes learning equity, with the potential to bring IZ’s collections directly to schools and individuals in the most remote locations in the United States, and even around the world.
Many members of IZ are also actively involved in other biodiversity projects that entail large quantities of data management. For example, museum specialist Chad Walter helps maintain the “World of Copepods” section of the World Registry of Marine Species. And research zoologist Chris Meyer has been integral to the development of the Biocube project— which he has promoted in both the research and citizen science arenas. Each instance entails the deployment of a one-cubic-foot frame in a natural environment over the course of a normal day: by identifying the organisms contained within the cube over that time period, the observer creates standardized, comparable records of all species and in the process gains a newfound perspective about biodiversity. As the project has grown, National Geographic photographers, amateur scientists, teachers and students have taken part. More studies, more deployment locations, more observations —that’s a lot of data!
At “the other end” of the data life cycle in IZ. Researchers in IZ rely on many types of data and electronic information for their work, and for the valuable insights they develop using this information. For example, postdoctoral scholar Jamie Baldwin-Fergus combines water quality measurements with computer modeling to reproduce the underwater light environment of hyperiid amphipods recovered from the ocean’s floor. The models help Jamie to better correlate adaptations in eye morphology with the visual abilities of these little-studied creatures.
Clearly, big data is a big deal here in IZ, and plays a central role in not only the research conducted based on the collections but also in making the collections and scientific discoveries accessible to you. But most importantly, technological advances on the front end (better sensors, faster and cheaper DNA sequencing) and the back end (computer modeling software, data management and storage) are what have really shaped the ability of IZ researchers and staff to collect, harness, and utilize so much valuable information about invertebrate species, their communities, and our world.
As a final example, consider one of the most ambitious data-driven projects in modern biology: an international team of researchers (including IZ’s Chris Meyer) is building a virtual representation of an entire island! Based on a combination of experimental work, theory—and a mountain of existing data, the Moorea Island Digital Ecosystem Project (MooreaIDEA) will create what is essentially a publically available virtual island. This virtual lab will allow people to generate and test hypotheses about how an ecosystem reacts to various human activities. Big data to support big ideas, indeed.
by Liz Boatman [edited by Allen G. Collins]
your article on data science is very good keep it up thank you for sharing.
Posted by: data science training in hyderabad | 28 December 2019 at 01:23 AM