Trust and Reputation in Data Sharing: A Survey
- URL: http://arxiv.org/abs/2508.14028v1
- Date: Tue, 19 Aug 2025 17:39:31 GMT
- Title: Trust and Reputation in Data Sharing: A Survey
- Authors: Wenbo Wu, George Konstantinidis,
- Abstract summary: Trust between data providers and data consumers is widely considered one of the most important factors for enabling data sharing initiatives.<n>Concerns about data sensitivity, privacy breaches, and misuse contribute to reluctance in sharing data across various domains.<n>There has been a rise in technological and algorithmic solutions to measure, capture and manage trust, trustworthiness, and reputation.
- Score: 0.276240219662896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data sharing is the fuel of the galloping artificial intelligence economy, providing diverse datasets for training robust models. Trust between data providers and data consumers is widely considered one of the most important factors for enabling data sharing initiatives. Concerns about data sensitivity, privacy breaches, and misuse contribute to reluctance in sharing data across various domains. In recent years, there has been a rise in technological and algorithmic solutions to measure, capture and manage trust, trustworthiness, and reputation in what we collectively refer to as Trust and Reputation Management Systems (TRMSs). Such approaches have been developed and applied to different domains of computer science, such as autonomous vehicles, or IoT networks, but there have not been dedicated approaches to data sharing and its unique characteristics. In this survey, we examine TRMSs from a data-sharing perspective, analyzing how they assess the trustworthiness of both data and entities across different environments. We develop novel taxonomies for system designs, trust evaluation framework, and evaluation metrics for both data and entity, and we systematically analyze the applicability of existing TRMSs in data sharing. Finally, we identify open challenges and propose future research directions to enhance the explainability, comprehensiveness, and accuracy of TRMSs in large-scale data-sharing ecosystems.
Related papers
- OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value [74.80873109856563]
OpenDataArena (ODA) is a holistic and open platform designed to benchmark the intrinsic value of post-training data.<n>ODA establishes a comprehensive ecosystem comprising four key pillars: (i) a unified training-evaluation pipeline that ensures fair, open comparisons across diverse models; (ii) a multi-dimensional scoring framework that profiles data quality along tens of distinct axes; and (iii) an interactive data lineage explorer to visualize dataset genealogy and dissect component sources.
arXiv Detail & Related papers (2025-12-16T03:33:24Z) - What's the next frontier for Data-centric AI? Data Savvy Agents [71.76058707995398]
We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
arXiv Detail & Related papers (2025-11-02T17:09:29Z) - Collaborative Perception Datasets for Autonomous Driving: A Review [9.498615656347264]
Collaborative perception has attracted growing interest from academia and industry due to its potential to enhance perception accuracy, safety, and robustness in autonomous driving.<n>Numerous collaborative perception datasets have emerged, varying in cooperation paradigms, sensor configurations, data sources, and application scenarios.<n>As the first comprehensive review focused on collaborative perception datasets, this work reviews and compares existing resources from a multi-dimensional perspective.
arXiv Detail & Related papers (2025-04-17T06:49:21Z) - Self-consistent Deep Geometric Learning for Heterogeneous Multi-source Spatial Point Data Prediction [10.646376827353551]
Multi-source spatial point data prediction is crucial in fields like environmental monitoring and natural resource management.
Existing models in this area often fall short due to their domain-specific nature and lack a strategy for integrating information from various sources.
We introduce an innovative multi-source spatial point data prediction framework that adeptly aligns information from varied sources without relying on ground truth labels.
arXiv Detail & Related papers (2024-06-30T16:13:13Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Collect, Measure, Repeat: Reliability Factors for Responsible AI Data
Collection [8.12993269922936]
We argue that data collection for AI should be performed in a responsible manner.
We propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics.
arXiv Detail & Related papers (2023-08-22T18:01:27Z) - DBFed: Debiasing Federated Learning Framework based on
Domain-Independent [15.639705798326213]
We propose a debiasing federated learning framework based on domain-independent, which mitigates model bias by explicitly encoding sensitive attributes during client-side training.
This paper conducts experiments on three real datasets and uses five evaluation metrics of accuracy and fairness to quantify the effect of the model.
arXiv Detail & Related papers (2023-07-10T14:39:57Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - A big data intelligence marketplace and secure analytics experimentation
platform for the aviation industry [0.0]
This paper introduces the ICARUS big data-enabled platform that offers a novel aviation data and intelligence marketplace.
It holistically handles the complete big data lifecycle from the data collection, data curation and data exploration to the data integration and data analysis.
arXiv Detail & Related papers (2021-11-18T18:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.