PUDEL: Paving the Way for Pawsome Data Models and Vocabularies in the Academic Community
https://zenodo.org/records/10698404
Making knowledge accessible in a systematic and standardized manner to allow for extensive use and reuse is one of the key challenges in the Digital Humanities (DH). The process of data modeling (Flanders / Jannidis 2015) to adequately capture knowledge has become a vital part of this task. Yet, it is typically not trivial for researchers to document and publish their models or vocabularies in order to make them available for reuse following principles such as FAIR (Wilkinson et al. 2016) and, consequently, findable etc. for other researchers to discover them.
The "PUDEL" project (Publikationsdienst für wissenschaftliche Datenmodelle und Vokabulare, based at the Saxon Academy of Sciences and Humanities Leipzig) focuses on developing a comprehensive publication service that caters to the specific requirements of data models and vocabularies. The project recognizes the critical role of these resources in facilitating data interoperability, enabling data reuse, and promoting collaborative research across various disciplines.
Therefore, the core objective of PUDEL is to establish a comprehensive platform that enables researchers to publish and document their data models and vocabularies in various formats (XML schemas, RDF-based ontologies, SKOS vocabularies etc.) in a standardized and easily accessible manner, taking into account discipline-specific requirement of such models and their respective meta data. Existing services, like the Vocabs Service of ACDH-CH (Austrian Academy of Sciences 2023) only support the publication of vocabularies in SKOS format, or require the data model to already be published online to be able to reference it, as it is the case with VoCoReg (Fraunhofer Society 2023). PUDEL tries to offer a more inclusive access and focus here, avoiding the trap of being a mere ‘data silo’.
The platform offers a range of robust tools and workflows for documentation, validation, and versioning in a Git-based file storage backend. This systematic approach guarantees the long-term sustainability and usability of the published resources, allowing researchers to confidently share and disseminate their data models and vocabularies.
PUDEL is built around an intuitive web service that acts as an entry point, and various ‘middleware’ services in order to validate data models and create representations based on established best practices (Garijo / Poveda-Villalón 2020; Semantic Web Deployment Working Group 2008).
An initial project phase of PUDEL was funded by the Saxon State Ministry of Science and Arts within the framework of SaxFDM, the Saxon initiative for research data management. During this stage, two main aspects were addressed:
The first task was successfully tackled by developing a functional prototype that provides basic functionality and workflows (to be presented at the Dhd). The second point included creating an overview of projects, which offer similar services in order to identify important features PUDEL needs to offer in order to fulfill typical use cases. In addition, tools and services were investigated that could be utilized during the implementation of the service.
At this point, a second project phase is in preparation. It is planned to extend the prototype and to address the following tasks:
(1) One of the key features of PUDEL will be exploration and discoverability of data models. The platform will incorporate comprehensive search and retrieval mechanisms, allowing researchers to efficiently explore and access specific vocabularies and ontologies etc. relevant to their research interests or project requirements. Additionally, PUDEL will support automatic publication on the open repository Zenodo (European Organization for Nuclear Research, OpenAIRE 2013).
(2) The service is developed as a free and open-source software (FOSS), making it possible for institutions to run their own instance of PUDEL and allowing them to publish data models within their own name spaces. As a set of software tools, it should also be simple to maintain and update. Here, PUDEL follows the approach of minimal computing (Sayers 2016) whenever possible, using as few resources as possible and avoiding a complex system architecture. Furthermore, OpenAPI is used to create a standardized documentation of the service APIs.
(3) In addition to its technical infrastructure, PUDEL recognizes the importance of community building and knowledge dissemination. The project will organize a series of training events, conference talks, and coffee lectures to raise awareness about the significance of sharing data models in research – and addressing more advanced user such as developers. By advancing data publication and interoperability, PUDEL contributes to the acceleration of scientific progress, promotes open science principles, and supports the academic community in their research endeavors.
Bibliographie
- Austrian Academy of Sciences , Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH), "Vocabs services", <https://vocabs.dariah.eu/> [17.07.2023].
- European Organization for Nuclear Research, OpenAIRE (2013): "Zenodo", in: CERN Publ., doi: 10.25495/7GXK-RD71, < https://www.zenodo.org/ > [17.07.2023].
- Flanders, Julia / Jannidis, Fotis (2015): "Knowledge Organization and Data Modeling in the Humanities", < https://nbn-resolving.org/html/urn:nbn:de:bvb:20-opus-111270 > [17.07.2023].
- Fraunhofer Society , "VoCol Service on VoCoREG", <https://www.vocoreg.com/> [17.07.2023].
- Garijo, Daniel / Poveda-Villalón, María (2020): "Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web", < https://doi.org/10.48550/arXiv.2003.13084 > [17.07.2023].
- OpenAPI Initiative (2022): "OpenAPI Specification v3.1.0", in: The Linux Foundation (Publ.), < https://spec.openapis.org/oas/latest.html > [17.07.2023]
- Sayers, Jentery (2016): "Minimal Definitions", in: Minimal Computing Working Group of GO::DH (Publ.), < https://go-dh.github.io/mincomp/thoughts/2016/10/02/minimal-definitions/ > [17.07.2023].
- Semantic Web Deployment Working Group (2008): "Best Practice Recipes for Publishing RDF Vocabularies" (Work in Progress), < https://www.w3.org/TR/swbp-vocab-pub/ > [17.07.2023].
- Wilkinson, Mark D. et al. (2016): "The FAIR Guiding Principles for scientific data management and stewardship", in: Sci Data 3: 160018, < https://doi.org/10.1038/sdata.2016.18> [17.07.2023].