Periodic Reporting for period 1 - BiCIKL (Biodiversity Community Integrated Knowledge Library)

Reporting period: 2021-05-01 to 2022-10-31

At present, flawless exchange of biodiversity data is limited by technical and organisational barriers and efforts to overcome them are in their infancy. Inefficient exchange of data among Research Infrastructures (RIs) impairs research progress, increases costs and limits the possibilities for innovation. At the same time, research needs about biodiversity grow in magnitude in response to urgent societal needs, such as mass species extinction, loss of vital ecosystem services, discovery and cure of new diseases and development of bio-based materials, technologies and energy.

BiCIKL aims to catalyse a culture change in the way biodiversity data are identified, linked, integrated and re-used across the research lifecycle. By doing so, BiCIKL helps to increase the transparency, trustworthiness and efficiency of the entire research ecosystem. BiCIKL pursues its mission through enabling the capability of 15 key biodiversity infrastructures to exchange data among themselves and with external users. To make this possible, the RIs will have to adopt and fully implement the FAIR principles, for their data to become Findable, Accessible, Interoperable and Reusable.

A special focus of BiCIKL is the extraction and liberation of data from millions of pages of published literature making it accessible and reusable. Thanks to EU research funding, BiCIKL will deliver global level access to data and tools along the entire biodiversity research cycle: collecting specimens > extracting molecular sequences > identifying species > analysing and publishing results > constructing biodiversity knowledge graph > re-using data for new scientific discoveries and other societal needs.
During first 18 months, BiCIKL made a significant progress towards its four key objectives:

Find: Ensure seamless discoverability of data through globally unique identifiers from each participating infrastructure and across data domains. BICIKL developed and adopted standards for persistent identifiers (PIDs) for various biodiversity data classes. Examples are:
• A Pan-European system and workflow for digital object identifiers (DOI) to digital specimens in natural history collections, in collaboration with global stakeholders.
• Recommendations and best practices for use of PIDs in the biodiversity literature.
• Recommendations for citations of taxonomic names.
• Recommendations to infrastructures to use data brokers to link PIDs where competing systems exist.
• Improved mechanisms to discover taxon names, specimens, genetic sequences and literature information at each participating RI.

Access: Provide, facilitate, support and scale up open access to FAIR interlinked data, from literature, natural history collections, sequence archives and taxonomic nomenclature in both human-readable and machine-actionable formats. BiCIKL enhances transnational and virtual access to data via:
• Newly developed access tools and workflows.
• Open Project Calls to the research community.
• Mechanisms and criteria for project submission, evaluation and implementation.
• Annual reviews of the data portfolio and access procedures of the participating RIs.

Interoperate: Harmonising the existing standards, metadata, policies and technologies for provision and ingestion of FAIR data is developed through joint research & technical development and community engagement and resulted in:
• Recommendations for interoperability and compatible data standards among RIs.
• Best practice manual for findability, re-use and accessibility of RIs.
• Efficient bi-directional and multi-directional linking mechanisms between voucher specimens, sequences, taxon names, and literature.

Reuse: Optimisation of the reusability and reproducibility of complex datasets, assembled from different biodiversity-related domains for generation of new knowledge has been progressed through:
• BiCIKL hackathons organised for testing data linking mechanisms.
• Globally unique, automated workflows to extract, expose and interlink information in the legacy and prospectively published literature.
• Tools for semantic enhancement of the published content which will facilitate data liberation and re-use.
• Conversion workflow of full-text biodiversity articles into Linked Open Data (LOD) and construction of the biodiversity knowledge graph.
• Active engagement of the community in human-in-the-loop methods of data curation by workbench tools and clearing house mechanisms.

The workflows and tools created under the joint research activities of BiCIKL are tested in real time through the open call projects performed by research groups throughout the world, thus supporting another key objective of BiCIKL and its funding program: Building a new community of users who will be able to address societal challenges through data-driven, next-generation research.
BiCIKL already connects data from different, previously fragmented domains, including data liberated from the huge biodiversity literature into a big FAIR data pool seamlessly available to researchers, public authorities and business to foster innovations in science, nature conservation and digital economy. The BiCIKL expected results towards project’s objectives are: (1) A new vibrant community of users equipped with novel research tools for search and access to data interlinked across domains; (2) Interlinked corpora of knowledge used by research groups through newly developed bi- and multi-directional data linking; (3) Automated text, data mining and publishing workflows for extraction, conversion, semantic enhancement and re-use of highly valuable data now imprisoned in the literature. The added value of the new community over the sum of the existing services will be the Biodiversity Knowledge Hub (BKH), a single knowledge broker to interlinked, both human- and machine-readable FAIR data, connecting specimens, genomics, taxonomy and publications.

Beyond research, BiCIKL will add a significant value in serving the society through combined use of data from different domains to provide evidence, for example on biological invasions, or historical dynamics of biodiversity and ecosystems, hence modelling and supporting informed policy decisions in pursuing the key goals of the 2022 Biodiversity United Nations COP15 conference in Montreal: (1) protect and restore 30% of the world’s land and seas globally by 2030, and (2) reduce the extinction rate by tenfold for all species by 2050. The BiCIKL results will be a direct contribution to achieving these life-saving goals!

The actual cost of personnel is significantly above the agreed upon monthly rate to be charged to the BiCIKL project. GBIF have agreed to this indirect "co-funding" due to the strong overlap with BiCIKL purpose and the GBIF Work Programme. GBIF is aware that we will only be financially compensated up to the agreed upon budget amount and do not expect any further renumeration. Obviously if, at the end of the project, there are funds left over we would be grateful for any considerations, should it be possible, but again, we stress, this is not our expectation. We suggested simply reporting our monthly cost in accordance to the agreed upon monthly budget rate, but were informed that our true cost must be reported upon, which is what we have done. We will continue to supply the agreed upon PM for the BiCIKL project, despite maximum funding running out well before the end of the project.
