Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks
- URL: http://arxiv.org/abs/2501.09029v1
- Date: Sun, 12 Jan 2025 16:13:27 GMT
- Title: Enhancing Data Integrity through Provenance Tracking in Semantic Web Frameworks
- Authors: Nilesh Jain,
- Abstract summary: SURROUND Australia Pty Ltd demonstrates innovative applica-tions of the PROV Data Model (PROV-DM) and its Semantic Web variant, PROV-O.
The paper highlights the company's architecture for capturing comprehensive provenance data, en-abling robust validation, traceability, and knowledge inference.
- Score: 1.3597551064547502
- License:
- Abstract: This paper explores the integration of provenance tracking systems within the context of Semantic Web technologies to enhance data integrity in diverse operational environments. SURROUND Australia Pty Ltd demonstrates innovative applica-tions of the PROV Data Model (PROV-DM) and its Semantic Web variant, PROV-O, to systematically record and manage provenance information across multiple data processing domains. By employing RDF and Knowledge Graphs, SURROUND ad-dresses the critical challenges of shared entity identification and provenance granularity. The paper highlights the company's architecture for capturing comprehensive provenance data, en-abling robust validation, traceability, and knowledge inference. Through the examination of two projects, we illustrate how provenance mechanisms not only improve data reliability but also facilitate seamless integration across heterogeneous systems. Our findings underscore the importance of sophisticated provenance solutions in maintaining data integrity, serving as a reference for industry peers and academics engaged in provenance research and implementation.
Related papers
- World-Consistent Data Generation for Vision-and-Language Navigation [52.08816337783936]
Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through photorealistic environments following natural-language instructions.
One main obstacle existing in VLN is data scarcity, leading to poor generalization performance over unseen environments.
We propose the world-consistent data generation (WCGEN), an efficacious data-augmentation framework satisfying both diversity and world-consistency.
arXiv Detail & Related papers (2024-12-09T11:40:54Z) - Deploying Large Language Models With Retrieval Augmented Generation [0.21485350418225244]
Retrieval Augmented Generation has emerged as a key approach for integrating knowledge from data sources outside of the large language model's training set.
We present insights from the development and field-testing of a pilot project that integrates LLMs with RAG for information retrieval.
arXiv Detail & Related papers (2024-11-07T22:11:51Z) - Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study [46.55831783809377]
Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach to mitigating the hallucination of large language models (LLMs)
We develop PruningRAG, a plug-and-play RAG framework that uses multi-granularity pruning strategies to more effectively incorporate relevant context and mitigate the negative impact of misleading information.
arXiv Detail & Related papers (2024-09-03T03:31:37Z) - WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs [10.380692079063467]
We propose WeKnow-RAG, which integrates Web search and Knowledge Graphs into a "Retrieval-Augmented Generation (RAG)" system.
First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval.
Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process.
arXiv Detail & Related papers (2024-08-14T15:19:16Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Self-consistent Deep Geometric Learning for Heterogeneous Multi-source Spatial Point Data Prediction [10.646376827353551]
Multi-source spatial point data prediction is crucial in fields like environmental monitoring and natural resource management.
Existing models in this area often fall short due to their domain-specific nature and lack a strategy for integrating information from various sources.
We introduce an innovative multi-source spatial point data prediction framework that adeptly aligns information from varied sources without relying on ground truth labels.
arXiv Detail & Related papers (2024-06-30T16:13:13Z) - Curating Grounded Synthetic Data with Global Perspectives for Equitable AI [0.5120567378386615]
We introduce a novel approach to creating synthetic datasets, grounded in real-world diversity and enriched through strategic diversification.
We synthesize data using a comprehensive collection of news articles spanning 12 languages and originating from 125 countries, to ensure a breadth of linguistic and cultural representations.
Preliminary results demonstrate substantial improvements in performance on traditional NER benchmarks, by up to 7.3%.
arXiv Detail & Related papers (2024-06-10T17:59:11Z) - Synthetic-To-Real Video Person Re-ID [57.937189569211505]
Person re-identification (Re-ID) is an important task and has significant applications for public security and information forensics.
We investigate a novel and challenging setting of Re-ID, i.e., cross-domain video-based person Re-ID.
We utilize synthetic video datasets as the source domain for training and real-world videos for testing.
arXiv Detail & Related papers (2024-02-03T10:19:21Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Entity-Switched Datasets: An Approach to Auditing the In-Domain
Robustness of Named Entity Recognition Models [49.878051587667244]
We propose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities.
We create entity-switched datasets, in which named entities in the original texts are replaced by plausible named entities of the same type but of different national origin.
We find that state-of-the-art systems' performance vary widely even in-domain: In the same context, entities from certain origins are more reliably recognized than entities from elsewhere.
arXiv Detail & Related papers (2020-04-08T17:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.