Related papers: Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks

Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks

URL: http://arxiv.org/abs/2501.09029v1
Date: Sun, 12 Jan 2025 16:13:27 GMT
Title: Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks
Authors: Nilesh Jain,
Abstract summary: SURROUND Australia Pty Ltd demonstrates innovative applica-tions of the PROV Data Model (PROV-DM) and its Semantic Web variant, PROV-O.<n>The paper highlights the company's architecture for capturing comprehensive provenance data, en-abling robust validation, traceability, and knowledge inference.
Score: 1.3597551064547502
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This paper explores the integration of provenance tracking systems within the context of Semantic Web technologies to enhance data integrity in diverse operational environments. SURROUND Australia Pty Ltd demonstrates innovative applica-tions of the PROV Data Model (PROV-DM) and its Semantic Web variant, PROV-O, to systematically record and manage provenance information across multiple data processing domains. By employing RDF and Knowledge Graphs, SURROUND ad-dresses the critical challenges of shared entity identification and provenance granularity. The paper highlights the company's architecture for capturing comprehensive provenance data, en-abling robust validation, traceability, and knowledge inference. Through the examination of two projects, we illustrate how provenance mechanisms not only improve data reliability but also facilitate seamless integration across heterogeneous systems. Our findings underscore the importance of sophisticated provenance solutions in maintaining data integrity, serving as a reference for industry peers and academics engaged in provenance research and implementation.

Related papers

Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG [51.120170062795566]
We propose Divide-Then-Align (DTA) to endow RAG systems with the ability to respond with "I don't know" when the query is out of the knowledge boundary.<n>DTA balances accuracy with appropriate abstention, enhancing the reliability and trustworthiness of retrieval-augmented systems.
arXiv Detail & Related papers (2025-05-27T08:21:21Z)
COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence [22.216759050092385]
Open Source Intelligence (OSINT) requires the integration and reasoning of diverse multimodal data. We introduce COSINT-Agent, a knowledge-driven multimodal agent tailored to address the challenges of OSINT in the Chinese domain. Central to COSINT-Agent is the innovative EES-Match framework, which bridges COSINT-MLLM and EES-KG.
arXiv Detail & Related papers (2025-03-05T06:16:15Z)
World-Consistent Data Generation for Vision-and-Language Navigation [52.08816337783936]
Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through photorealistic environments following natural-language instructions.<n>One main obstacle existing in VLN is data scarcity, leading to poor generalization performance over unseen environments.<n>We propose the world-consistent data generation (WCGEN), an efficacious data-augmentation framework satisfying both diversity and world-consistency.
arXiv Detail & Related papers (2024-12-09T11:40:54Z)
Deploying Large Language Models With Retrieval Augmented Generation [0.21485350418225244]
Retrieval Augmented Generation has emerged as a key approach for integrating knowledge from data sources outside of the large language model's training set. We present insights from the development and field-testing of a pilot project that integrates LLMs with RAG for information retrieval.
arXiv Detail & Related papers (2024-11-07T22:11:51Z)
Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study [46.55831783809377]
Retrieval-augmented generation (RAG) is recognized as an effective approach for mitigating the hallucination of large language models (LLMs) through the integration of external knowledge.<n>In this paper, we standardize a benchmark dataset that combines structured and unstructured knowledge across diverse and complementary domains.<n>We also develop a plug-and-play RAG framework, PruningRAG, whose main characteristic is to employ multi-granularity pruning strategies.
arXiv Detail & Related papers (2024-09-03T03:31:37Z)
WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs [10.380692079063467]
We propose WeKnow-RAG, which integrates Web search and Knowledge Graphs into a "Retrieval-Augmented Generation (RAG)" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process.
arXiv Detail & Related papers (2024-08-14T15:19:16Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
Self-consistent Deep Geometric Learning for Heterogeneous Multi-source Spatial Point Data Prediction [10.646376827353551]
Multi-source spatial point data prediction is crucial in fields like environmental monitoring and natural resource management. Existing models in this area often fall short due to their domain-specific nature and lack a strategy for integrating information from various sources. We introduce an innovative multi-source spatial point data prediction framework that adeptly aligns information from varied sources without relying on ground truth labels.
arXiv Detail & Related papers (2024-06-30T16:13:13Z)
Synthetic-To-Real Video Person Re-ID [57.937189569211505]
Person re-identification (Re-ID) is an important task and has significant applications for public security and information forensics. We investigate a novel and challenging setting of Re-ID, i.e., cross-domain video-based person Re-ID. We utilize synthetic video datasets as the source domain for training and real-world videos for testing.
arXiv Detail & Related papers (2024-02-03T10:19:21Z)
Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems [45.05372822216111]
Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected. However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISPDM), often fail due to the disproportionate amount of time needed for understanding and preparing the data. This contribution intends present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS challenges.
arXiv Detail & Related papers (2023-07-21T15:04:00Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models [49.878051587667244]
We propose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities. We create entity-switched datasets, in which named entities in the original texts are replaced by plausible named entities of the same type but of different national origin. We find that state-of-the-art systems' performance vary widely even in-domain: In the same context, entities from certain origins are more reliably recognized than entities from elsewhere.
arXiv Detail & Related papers (2020-04-08T17:11:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.