Research Article |
Corresponding author: Julia D. Sigwart ( julia.sigwart@senckenberg.de ) Academic editor: Franco Andreone
© 2025 Julia D. Sigwart, Matthias Schleuning, Angelika Brandt, Markus Pfenninger, Hanieh Saeedi, Thomas Borsch, Eva Häffner, Robert Lücking, Anton Güntsch, Helmuth Trischler, Till Töpfer, Karsten Wesche, Collectomics Consortium.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Sigwart JD, Schleuning M, Brandt A, Pfenninger M, Saeedi H, Borsch T, Häffner E, Lücking R, Güntsch A, Trischler H, Töpfer T, Wesche K, Consortium C (2025) Collectomics – towards a new framework to integrate museum collections to address global challenges. Natural History Collections and Museomics 2: 1-20. https://doi.org/10.3897/nhcm.2.148855
|
Collections’ digitisation is a priority in many natural history collections, and publicly available datasets are expanding rapidly. The potential value of collections remains largely untapped even in modern research, because the vast scope of collections dwarfs current efforts at data mobilisation. Collections are continually expanding, and there are an estimated 3 billion undigitised specimen records worldwide. In this review, we use a simple model to illustrate that current efforts at global digitisation will not succeed until the late 21st century at the earliest, unless new technologies are harnessed and commitments by funding bodies and society are made. As we advance toward specimen digitisation, an equally important consideration is that the majority of these digital records only represent a fraction of the information potentially available from the collection objects. The term “collectomics” was coined in discussions within the Senckenberg institution as a phrase for digital frameworks that embrace all current and future data and knowledge derived from specimens. This expands on the concept of museomics, which was originally defined to focus on molecular data generated from museum specimens. Rooted in the concept of the extended specimen, collectomics encompasses metadata, images, traits, DNA, and further data extracted in the future with yet unknown applications, all of which are connected to environmental data and other historical contextual information. Thus, a view of digitisation under the collectomics concept is not limited to natural history collections but directly integrates evolutionary, ecosystem and social sciences, including the human contributions of collectors, donors, and researchers in the past and future. A “collectomics” view envisions seamless integration of multidimensional specimen-based data, with interoperability among historical, artistic, ethnographic, and natural history collections, to generate knowledge that is needed to tackle global challenges.
Collectiomics, digitisation, FAIR, museum collections
Research collections represent key scientific infrastructure for biology and they constitute some of the oldest continuously maintained and expanded scientific resources on Earth. Climate change and other human-driven transformations to our planet are impacting global biodiversity at an unprecedented speed and extent, with cascading impacts on our livelihoods (
Scientific advice on safeguarding biological and cultural diversity should rely on all available information, to understand past and current developments of our natural and cultural heritage and to predict future responses of Earth’s systems. To assess the impact of ever-accelerating global change, the new situation must be measured against some prior condition. Scientific museum collections with their manifold objects and associated data represent tangible evidence and an irreplaceable historical record of our changing planet. Natural history museums collectively house billions of biological records spanning more than three centuries (
To realise the potential value of these collections for addressing issues of global change, there is increasing awareness that both the objects and their associated metadata must be made available through digitally accessible, integrated research platforms. Digital access to the authoritative datasets connected to museums empowers local researchers to study local biodiversity trends (
Current technological developments provide an unprecedented opportunity to unleash the full potential of collections by fully integrating the myriad data dimensions from collection objects. The term “collectomics” originated in digitisation discussions and internal documents in the Senckenberg institution, and has been used in the literature in the last few years (
Unleashing the potential of collections to address global challenges also depends on increasing accessibility of multidimensional collections data, via their integration across disciplines, and ensuring that FAIR criteria (Findability, Accessibility, Interoperability, and Reusability) for past, present and future collection objects and data are met (
Here we briefly review two important factors in planning for a more integrated future for natural history collections with a broad view of their power for confronting global problems. The first issue is the current pace of digitisation of collection metadata, and the need for significant acceleration. The second part is to define collectomics and a holistic vision with specimen objects at the heart of powerful interdisciplinary data approaches.
Natural history collections have already started to become integrated into globally coherent research platforms, and the “Global Museum” (
There are important fundamental differences between a specimen collection, which is an analogue storage system, and a database. A physical, spatial organization of knowledge has an inherently human-centric organisation—specimens are arranged in a way that makes sense to people (by systematics or stratigraphy), and although these relationships are scientific, they rely on human interpretation. Relationships in a database are formalised and rely on predefined rules. (This is a constant source of low-level friction in the experience of collections digitisation, for example where species identifications are ambiguous or uncertain.) A collection occupies physical space, which requires material investment and limits the feasibility of reorganisation. Databases can be dissected, cross-referenced, and reassembled dynamically. Collections depend on a user physically browsing to retrieve information; however, like a physical bookshelf in a library, the information retrieved usually exceeds the boundary of the original question, from observations of the specimen itself or serendipitous information in the context of nearby comparative material. Databases allow structured queries across categories. For example, extracting all records from one geographical location is extremely difficult when the physical collection is organised taxonomically, and close to impossible when the collection is not digitized.
Collections-based research has in part been stymied because data are stored in different formats and databases (
Top: for more than 200 years, relevant object information was most often recorded in the form of hand-written labels and inventories. Bottom: Natural history museums directly intersect with social sciences, although the connections often go unrecognised. Top left: Jan-Peter Kasper/Universität Jena, Top right: Sigrid Hof / Senckenberg Research Institute and Museum Frankfurt, Bottom left: image of Dr Fritz Haas (seated) and unnamed companions (men and women), in the act of collecting a new species Unio valentinus, Bottom right: natural history objects also appear in the context of art objects, photo: Emőke Dénes.
Digitisation of collections is chasing a moving target, because data standards and applications are evolving, but most importantly because collections are continuously expanding. Any consideration of digitisation must also include future provisioning; this is not a final solution, full digitisation is a step in improving data infrastructure that must also include commitments to collections maintenance and support of curators and technicians with specialist expertise (
A review of major herbaria estimated that only 21% of preserved collections were available via GBIF (
To consider how the rate of global digital record capture compares with the ongoing growth of collections, we constructed a simple model to illustrate the intersection between the trajectories of these two trends (Fig.
To estimate the potential forecast for digital record growth, we extracted historical data from GBIF for the number of “preserved specimen” records in the database each quarter from 2008 to the start of 2025. These include all records, so including the largest possible set of records, including nonstandard records, and all without validation and without geographical data, and not the preferred scenario of high-quality records data. We fit a polynomial regression to these datapoints using R (
Although global coverage of biological specimen data are good (inset, specimen-based records from GBIF 2023), it is only a fraction of the data that we need to understand global change. Main figure: Simplified scheme of projected global collections growth. The black line represents a linear increase of the number of collection objects in time from the late 1700s to 2020 projected into the future. Dashed lines are model projections based on data records from GBIF that are specimen occurrence records combining all available data including incomplete records (GBIF 2023). In 2020 there are ~200 million digitised records, compared to ~ 3 billion records without digital footprints. The current pace of progress thus requires more than 100 years to achieve full digitisation (blue line). In the best-case model prediction, we could achieve complete digitisation at the earliest around the year 2071; if there is no acceleration (red line) and digitisation continues to increase linearly, the long-term pace of collection growth is higher than the pace of digitisation and the global digitisation gap will even increase.
The current pace of progress could require 100 years to achieve full digitisation (Fig.
Even if all collections were eventually databased, their digital records, as they appear in collections databases or global aggregators, such as GBIF, are not equivalent to specimens. Occurrence records from metadata are one very important and relevant dimension of collections data, but as noted above, there are other dimensions, attributes, and data types associated with every physical specimen. In order to capture these additional features of the extended specimen, we must first dramatically scale up mass data retrieval and provisioning to keep up with ongoing growth of collections and to capture the backlog. Representative digital coverage of these objects at global scale is therefore imperative.
The current revolutionary advances of artificial intelligence (AI) promise to accelerate the current pace of digitalisation substantially. While application of AI in automated cataloguing and metadata creation is already widespread, there are limitations and human expertise is still a core requirement (
Once digitised, data must be linked to other massive species-based information hubs (
Collectomics describes the multiple ways that voucher specimens can be harnessed for additional datasets, as well as the many ways these data interact synergistically with other research fields including science and humanities in order to integrate information across scales and generate new knowledge. Collections form the core data for biological and geological sciences, and digital information derived from research on collection objects feeds into interconnected research platforms that are readily accessible to science and society across the globe. What is currently missing is ensuring that these data projects connect directly back to specimen identifiers. Collectomics implies that large scale data platforms interact with other big-data approaches (e.g., remote sensing, citizen science) to form interconnected data networks. These are expected to inspire new developments such as in analyses of biodiversity trends across scales or process-based models of ecosystem functions, but also interdisciplinary research including sciences and humanities.
The point that natural history museum collection objects are useful to myriad subjects has been raised repeatedly (
Is it more useful to provide publicly accessible, large scale, but incomplete and partially inaccurate data (the actual status quo of any museum), or to provide a small fraction of available data but ensuring high quality (expert validated records)? This is a conflict that is not unique to natural history collections, but all “big data” approaches more generally. The benefits of public accessibility are clear, to make hidden data available to a larger pool of potential experts for further validation. But this does not entirely sit comfortably with the cultural values of museums as sources of expertise and trust.
Observation efforts including large-scale citizen-science platforms have provided an explosion of biological records data; however, records that are not connected to voucher specimens have lower geographic, temporal, and species coverage (
The preservation of physical voucher specimens or objects underpins replicability of all collections-based research (
Mobilising data on species identity and occurrence is often the focus of studies considering the utility of natural history collections (
The potential of the “extended specimen” (
The integration of specimen- and object-based data across collections, object categories and research disciplines is not trivial (
Accelerating mass digitisation has a vast potential to make collections data available where they are actually needed. In the context of Anthropocene change, resources and contextual data are most needed in tropical countries of the Global South (
Specimens and objects in research collections were collected by historical actors who frequently intersect multiple disciplines. Contributions from a certain person are often distributed across multiple holding institutions. Collections credited to a certain person also often connect to uncredited local knowledge holders – the local collector who passed on knowledge or materials that ultimately end up in museums. All of these intersecting agents are an important legacy in tracing and confronting the colonial history of cultural and scientific museum collections in the Global North.
Sensitivity to provenance issues is potentially better developed in art and cultural collections (
The vision of collectomics includes all types of specimen data, linking an object to descriptive metadata, images, sequences, and also its history. The historical and cultural context of the people who originally collected it, and later researchers who brought new approaches to include that specimen in another analysis, connected through geographical, biological, and biographical information. Making collections fully accessible, fully integrated, and visible to the broadest group of global users, is crucial to protecting collections into the future. A fully digitised collections object is not just a species name, an image and some coordinates, but it is a rich, complex, and ongoing history. Our perspective focusses primarily on natural history collections, but this is only one facet we bring into focus, whereas other users will be more driven by other facets among historical, ethnographic, and artistic endeavours that create a rich unified tapestry.
We are very grateful to this new journal, Natural History Collections and Museomics, for providing a platform for the global efforts in collectomics. This paper benefited from input of three reviewers and discussions with many colleagues who are too numerous to thank here but all have our gratitude, and we especially acknowledge input from Steffen Pauls, Dieter Uhl, and Georg Zizka (Senckenberg Research Institute and Museum Frankfurt), Klaus Klass and Ulf Linnemann (Senckenberg Natural History Collections Dresden), and Johannes-Geert Hagmann (Deutsches Museum).
The authors have declared that no competing interests exist.
No ethical statement was reported.
No funding was reported.
Conceptualization: JDS, AB, MP, MS, KW, CC. Data curation: JDS. Formal analysis: JDS. Visualization: JDS. Writing – original draft: JDS. Writing – review and editing: TB, HT, AG, RL, MS, EH, KW, TT, MP, AB, HS, JDS.
Julia D. Sigwart https://orcid.org/0000-0002-3005-6246
Angelika Brandt https://orcid.org/0000-0002-5807-1632
Markus Pfenninger https://orcid.org/0000-0002-1547-7245
Hanieh Saeedi https://orcid.org/0000-0002-4845-0241
Eva Häffner https://orcid.org/0000-0001-6448-5826
Helmuth Trischler https://orcid.org/0000-0001-6923-2465
Karsten Wesche https://orcid.org/0000-0002-0088-6492
All of the data that support the findings of this study are available in the main text or Supplementary Information.
Additional contributors in the Collectomics Consortium, Germany
Data type: docx
R commands and data used to generate Fig.
Data type: R