In 2024, we organised a workshop on the Darwin Core standard for the BiodivMon projects. Presentations explored challenges like data fragmentation and inconsistent formats, showcasing solutions including Darwin Core, ontologies, and open repositories. Key themes included future data model developments, AI-driven annotation, and community building. Case studies highlighted practical implementations, with LifeWatch ERIC showcasing its training and support infrastructure. The workshop emphasised collaboration to enhance data accessibility, integration, and knowledge sharing in biodiversity research.
Rewatch the workshop!
Summary
- Senem Önen Tarantini and Martina Pulieri (University of Salento) presented on data and metadata harmonisation as crucial processes for achieving FAIR principles. They explained how standardising data structures and information about data makes fragmented biodiversity information from diverse sources more accessible and usable, while also highlighting the importance of semantic interoperability and the challenges of achieving it due to differing definitions and lack of common semantic artefacts.
- Naouel Karam (InfAI) explained how semantic technologies can address the challenges posed by natural language ambiguity in biodiversity data. By utilising ontologies (formal classifications of concepts) and standardised vocabularies, researchers can achieve semantic search (finding data based on meaning), consistent metadata descriptions, and data integration across diverse sources. Existing tools like BiodivPortal, ABCD, schema.org, and ETD standards aid in this process, but future work aims to improve ontology creation, versioning, user feedback, and even integrate large language models for enhanced data annotation. Importantly, semantic technologies work alongside existing standards like DwC, with optimal choices depending on the specific data type and use case.
- Ilaria Rosati (CNR IRET – LifeWatch Italy) presented LifeWatch Italy’s approach to open and FAIR research, aligning with UNESCO’s open science recommendations. She outlined LifeWatch Italy’s comprehensive support system, which includes using the Argos tool for Data Management Plans, a data portal with EML metadata and taxonomic validation, Data Labs for collaborative code development and data generation, and a metadata catalogue integrated with FAIRness assessment tools. This system aims to address common barriers to data sharing by providing tools, services, and support that facilitate the entire research lifecycle, from planning to reuse.
- Guillaume Body and Sophie Pamerlon (OFB) focused on the Darwin Core standard. DwC provides a consistent format for describing occurrences, taxa, and related data, promoting FAIR data principles and facilitating data exchange and integration. While not a data management standard itself, DwC helps structure data for sharing through core classes and extensions, packaged in DwC archives containing data and metadata. An example of wildlife monitoring using electrofishing demonstrated how DwC can organise data into events, occurrences, and measurements, highlighting its effectiveness and ongoing development to handle more complex data structures like nested occurrences.
- Rui Figueira (GBIF Node Manager for Portugal) shared Portugal’s experience with Darwin Core, highlighting both challenges and solutions for sharing biodiversity data. While DwC excels at occurrence data, Portugal’s focus on sampling events necessitated using Event Core alongside Occurrence Core. Nested data structures also posed a hurdle, but Portugal mitigates these issues through training, authority files, data validation, and emphasising Taxa Core for checklists. Looking forward, Portugal actively participates in developing a new GBIF data model that tackles nested data and supports diverse data sources. Their emphasis on training, standardised practices, and embracing new models demonstrates Portugal’s commitment to improving data quality and accessibility for global biodiversity research.
- Andrea Tarallo (CNR IRET – LifeWatch Italy) focused on data standardisation in biodiversity research, addressing challenges like data fragmentation (sampling events vs. occurrences), nested data structures, limited searchability of DwC extensions, and integrating individual-level data. Mentioned solutions included training and infrastructure support, standardised practices, data validation, and alternative approaches such as Portugal’s emphasis on Event Core and LifeWatch Italy’s use of annotated flat files for semantic search. Future developments focus on a new GBIF data model to handle complex data and the crucial role of persistent identifiers for FAIR compliance. The session concluded by highlighting the importance of training, community building, and potential future collaborations, also touching upon schema.org integration and reconciling individual and species-level data.
- Cosimo Vallo (training officer at LifeWatch ERIC) presented the organisation’s training architecture, designed for skill development, knowledge sharing, and community building. This includes a FAIR training catalogue of diverse resources, a training platform hosting LifeWatch-developed materials (with potential for project-specific sections), a community platform for user collaboration, and a help desk with a growing knowledge base. Open licensing is prioritised, with LifeWatch materials generally using CC-BY. This comprehensive system provides centralised support for the biodiversity and ecosystem science community.