Running a Data Integration Lab in the Context of the EHRI Project: Challenges, Lessons Learnt and Future Directions
- URL: http://arxiv.org/abs/2505.02455v1
- Date: Mon, 05 May 2025 08:39:18 GMT
- Title: Running a Data Integration Lab in the Context of the EHRI Project: Challenges, Lessons Learnt and Future Directions
- Authors: Herminio García-González, Mike Bryant, Suzanne Swartz, Fabio Rovigo, Veerle Vanden Daelen,
- Abstract summary: The EHRI project set out to build a trans-national network of archives, researchers, and digital practitioners to mitigate this problem.<n>One of its main outcomes was the creation of the EHRI Portal, a "virtual observatory" that gathers in one centralised platform descriptions of Holocaust-related archival sources from around the world.<n>In order to build the Portal a strong data identification and integration effort was required, culminating in the project's third phase with the creation of the EHRI-3 data integration lab.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Historical study of the Holocaust is commonly hampered by the dispersed and fragmented nature of important archival sources relating to this event. The EHRI project set out to mitigate this problem by building a trans-national network of archives, researchers, and digital practitioners, and one of its main outcomes was the creation of the EHRI Portal, a "virtual observatory" that gathers in one centralised platform descriptions of Holocaust-related archival sources from around the world. In order to build the Portal a strong data identification and integration effort was required, culminating in the project's third phase with the creation of the EHRI-3 data integration lab. The focus of the lab was to lower the bar to participation in the EHRI Portal by providing support to institutions in conforming their archival metadata with that required for integration, ultimately opening the process up to smaller institutions (and even so-called "micro-archives") without the necessary resources to undertake this process themselves. In this paper we present our experiences from running the data integration lab and discuss some of the challenges (both of a technical and social nature), how we tried to overcome them, and the overall lessons learnt. We envisage this work as an archetype upon which other practitioners seeking to pursue similar data integration activities can build their own efforts.
Related papers
- Lessons from a Big-Bang Integration: Challenges in Edge Computing and Machine Learning [52.86213078016168]
The project faced critical setbacks due to a big-bang integration approach.<n>The study identifies technical and organisational barriers, including poor communication.<n>It also considers psychological factors such as a bias toward fully developed components over mockups.
arXiv Detail & Related papers (2025-07-23T07:16:45Z) - The Human Labour of Data Work: Capturing Cultural Diversity through World Wide Dishes [3.770155074442168]
We present an example of participatory dataset creation, where community members both guide the design of the research process and contribute to the crowdsourced dataset.<n>We show that our approach can result in curated, high-quality data that supports decentralised contributions from communities.<n>We surface three dimensions of labour performed by participatory mediators that are crucial for participatory dataset construction.
arXiv Detail & Related papers (2025-02-09T17:09:46Z) - Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks [1.3597551064547502]
SURROUND Australia Pty Ltd demonstrates innovative applica-tions of the PROV Data Model (PROV-DM) and its Semantic Web variant, PROV-O.<n>The paper highlights the company's architecture for capturing comprehensive provenance data, en-abling robust validation, traceability, and knowledge inference.
arXiv Detail & Related papers (2025-01-12T16:13:27Z) - O1 Replication Journey: A Strategic Progress Report -- Part 1 [52.062216849476776]
This paper introduces a pioneering approach to artificial intelligence research, embodied in our O1 Replication Journey.
Our methodology addresses critical challenges in modern AI research, including the insularity of prolonged team-based projects.
We propose the journey learning paradigm, which encourages models to learn not just shortcuts, but the complete exploration process.
arXiv Detail & Related papers (2024-10-08T15:13:01Z) - A Survey on Integrated Sensing, Communication, and Computation [57.6762830152638]
The forthcoming generation of wireless technology, 6G, aims to usher in an era of ubiquitous intelligent services.<n>The performance of these modules is interdependent, creating a resource competition for time, energy, and bandwidth.<n>Existing techniques like integrated communication and computation (ICC), integrated sensing and computation (ISC), and integrated sensing and communication (ISAC) have made partial strides in addressing this challenge.
arXiv Detail & Related papers (2024-08-15T11:01:35Z) - On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey [82.49623756124357]
Zero-shot image recognition (ZSIR) aims to recognize and reason in unseen domains by learning generalized knowledge from limited data.<n>This paper thoroughly investigates recent advances in element-wise ZSIR and provides a basis for its future development.
arXiv Detail & Related papers (2024-08-09T05:49:21Z) - Research information in the light of artificial intelligence: quality and data ecologies [0.0]
This paper presents multi- and interdisciplinary approaches for finding the appropriate AI technologies for research information.
Professional research information management (RIM) is becoming increasingly important as an expressly data-driven tool for researchers.
arXiv Detail & Related papers (2024-05-06T16:07:56Z) - Towards a RAG-based Summarization Agent for the Electron-Ion Collider [0.5504260452953508]
A Retrieval Augmented Generation (RAG)--based Summarization AI for EIC (RAGS4EIC) is under development.
This AI-Agent not only condenses information but also effectively references relevant responses, offering substantial advantages for collaborators.
Our project involves a two-step approach: first, querying a comprehensive vector database containing all pertinent experiment information; second, utilizing a Large Language Model (LLM) to generate concise summaries enriched with citations based on user queries and retrieved data.
arXiv Detail & Related papers (2024-03-23T05:32:46Z) - Towards Formalizing HRI Data Collection Processes [4.090390588417062]
We contribute a clearly defined process to collect data with three steps for machine learning modeling purposes.
Specifically, we discuss our data collection goal and how we worked to encourage well-covered and abundant participant responses.
arXiv Detail & Related papers (2022-03-16T04:59:18Z) - Wizard of Search Engine: Access to Information Through Conversations
with Search Engines [58.53420685514819]
We make efforts to facilitate research on CIS from three aspects.
We formulate a pipeline for CIS with six sub-tasks: intent detection (ID), keyphrase extraction (KE), action prediction (AP), query selection (QS), passage selection (PS) and response generation (RG)
We release a benchmark dataset, called wizard of search engine (WISE), which allows for comprehensive and in-depth research on all aspects of CIS.
arXiv Detail & Related papers (2021-05-18T06:35:36Z) - Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper [50.25428141435537]
Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between machine learning, big data, streaming analytics, and the management of IT operations.
Main aim of the AIOPS workshop is to bring together researchers from both academia and industry to present their experiences, results, and work in progress in this field.
arXiv Detail & Related papers (2021-01-15T10:43:10Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.