From Plant Press, Vol. 25, No. 3, July 2022.
By Sylvia Orli and Julia Beros.
The breadth of collections held at the U.S. National Herbarium (USNH) document the incredible biodiversity and botanical heritage on our planet and serve as the backbone of our work. In service to the public and as the foundation for research, these collections, as they exist now and grow, are at the core of our mission and purpose as an institution. Making these assets available, in physical and digital form, is a fundamental part of fulfilling our responsibility. The USNH is one of the world’s largest and most diverse botanical collections, accounting for nearly 5 million specimens, the digitization of which fulfills our goal to make specimen data and images discoverable and accessible to all.
Since 2015, the Department of Botany and the Smithsonian Digitization Program Office (DPO) have embarked on a joint project to digitize all pressed and packeted plant specimens, which comprise the great majority of our collections. Collaborating with the contracted digitization company Picturae, based in the Netherlands, we have employed a conveyor belt system and a high-throughput approach to realize this ambition. This approach to digitization provides high resolution images and transcriptions of label data for each specimen. These assets are ported to our data catalog and subsequently aggregated with global collections data from other herbaria. By these means, the digitization of the USNH has a huge impact on botanical research throughout the world—collecting and codifying data that can unexpectedly inform the work of researchers near and far.
On May 13, the digitization conveyor scanned its last botanical specimen, the end of a 7-year run which digitized more than 3.8 million sheets, transcribed more than 2.8 million specimen labels, and added more than 80,000 new taxa to our collection catalog. We can now look in our herbarium cases and find a barcode on every sheet within every cubby, and inversely we can search a barcode or name (or any other possible line of data query) and look into our herbarium cases from our computer screens. This fantastic achievement can be credited to the hard work of staff and contractors throughout the Smithsonian, a collaboration that spans across units and job titles.
The foundational steps in the conveyor process begin in the specimen cases where collections were curated to their proper taxa and each folder barcoded with a unique ID number. Between 3-4 cases of material were prepped and brought to the conveyor per day to keep the conveyor at full speed. The conveyor is a long rolling belt where a technician places specimens at one end, and as these specimens transit to the other end, they are imaged and the photos are simultaneously reviewed by another technician to ensure the systems are operating correctly. The bulky objects and fruits were cataloged individually by hand. As the project went through the herbarium, sheet by sheet, this gave the team the opportunity to perform repairs on specimens that were damaged.
Implementation of the conveyor belt at the NMNH required several updates to our infrastructure and file transfer protocols. High-throughput internet connections were installed that could transfer 3,000–4,000 high resolution images to an image server each day. These images were also ported to our Digital Asset Management System (DAMS) using a copy utility designed specifically for mass digitization projects. Images from the DAMS were then transferred to our data catalog. On the data side, a workflow utilizing several scripts and macros was created to import 30,000–40,000 data records to the data catalog at one time. Transfers of data occurred daily, weekly, and monthly; small batches amassed into larger batches that eventually made their way to the data catalog at a rate of 60,000 records per month.
At any part of the process of digitization there was opportunity for moments of discovery and inquiry (as in the case where unique photographs and illustration plates were happened upon, or personal letters attached to specimens, as well as intimate notes and drawings made by a collector) but none more so than in the depths of collection data that was reviewed day in and day out in shared access sheets. Those working on the project developed a keen eye to repeat collectors, their handwriting and habits of note taking, and even their relationships to places, subjects, and other researchers. Through each record there was a small story being told—an unlikely collector on crew with Captain Cook, or a moss plucked from a historic first voyage into uncharted waters, collections from Theodore Roosevelt’s many hunting and collecting expeditions throughout east and central Africa, the prolific collections of Agnes Chase’s grasses, the varied expeditions of J.N. Rose, the conglomeration of other botanical institution’s collections, and so on. One could spend days immersed in collections from only Mexico or only Sweden and develop an aptitude for the geography and historic names of these places.
The remarkable change in workflow is best appreciated by looking at the past. Prior to the installation of the conveyor belt, our only available approach to large-scale digitization involved crowdsourcing. Botany uploaded 98 projects to the crowdsourcing venues in a two-year period (2014-2015) and 50,000 records were transcribed using this process, requiring countless hours of imaging, creating new projects and downloading the data. In its first two years of operation (2016-2017), the conveyor belt project yielded over 1.1 million records and images. Notably, cost and time savings with the conveyor have been significant, relative to traditional (one person/lightbox/camera) means. Although expensive upfront, the conveyor provided a 45% cost reduction and 80% processing time reduction over traditional means of digitization.
With this process, by 2020, we had completed the dicotyledons in our collections and most of the monocotyledons excluding the Poaceae. The pandemic put a hard stop to the imaging component of the project in March 2020, but within a few months it was given allowance to restart. Welcoming two contractors back into the dark and nearly empty halls of the Botany Department the project began imaging again and kept the data rolling, both making up for lost time and inspiring hope in our collective work. Between this time and March 2022, the team finished the Poaceae and most of the lichens and bryophytes, both of which were collections requiring special attention to their organization as they were still in the midst of curation projects. Given the skeletal team to prepare, move and scan the collection, coupled with new scheduling challenges and covid protocols, this timeline is quite astonishing. The last few weeks of the conveyor run were dedicated to retrieving the hidden specimens, tucked away in offices and back rooms, or loaned to other institutions, ensuring that every US botanical specimen was indeed digitized.
With the herbarium finished, our goal is to keep the herbarium at 100% digitized by imaging and cataloging every new specimen that comes to the herbarium. The Digitization Program Office (DPO) has helped Botany develop a plan to maintain its “fully digitized” status for the foreseeable future. The benefits and accomplishments of this project are already resonating with other research units and the museum is now implementing this digitizing technology to image the dragonfly collections of the Department of Entomology. The vision of the Botany Department is to make the herbarium collections accessible and visible to everyone who has an internet connection and to support research and general botanical curiosity. Moreover, there is an immense feeling of gratitude among all who touched the project. Not only does it fulfill part of our mission as a research institution, but it also fulfills our shared belief in the importance of openly accessible data both in the preservation of our natural history and in support of the growth and pursuit of knowledge and understanding.
Ride the digitization conveyor belt like a Smithsonian herbarium sheet. If you were a Smithsonian herbarium specimen collected in 1838, this would be the ride of a lifetime. A high-speed point-of-view video has been posted on Smithsonian’s social media channels that shows a botanical specimen sheet being digitized. The sheet is removed from a herbarium cabinet, rides on a cart down a hallway, is placed on the conveyor belt by gloved hands, is photographed from above, and then returns to the herbarium cabinet. Using this conveyor belt system, the U.S. National Herbarium has digitized more than 3.8 million herbarium sheets. The video is available for viewing at https://twitter.com/smithsonian/status/1535257893888458753 and https://www.facebook.com/Smithsonian/videos/590438809005159.
If you were a Smithsonian herbarium specimen collected in 1838, this would be the ride of a lifetime. In our largest digitization project, we used a conveyor belt system to digitize more than 3.8 million herbarium sheets, now available to all through #SmithsonianOpenAccess. pic.twitter.com/kjPfkCoYdY
— Smithsonian (@smithsonian) June 10, 2022
Comments
You can follow this conversation by subscribing to the comment feed for this post.