D2M: A Decentralized, Privacy-Preserving, Incentive-Compatible Data Marketplace for Collaborative Learning
- URL: http://arxiv.org/abs/2512.10372v1
- Date: Thu, 11 Dec 2025 07:38:05 GMT
- Title: D2M: A Decentralized, Privacy-Preserving, Incentive-Compatible Data Marketplace for Collaborative Learning
- Authors: Yash Srivastava, Shalin Jain, Sneha Awathare, Nitin Awathare,
- Abstract summary: We present prot, a decentralized data marketplace that unifies federated learning, blockchain arbitration, and economic incentives into a single framework for privacy-preserving data sharing.<n>prot achieves up to 99% accuracy on MNIST and 90% on Fashion-MNIST, with less than 3% degradation up to 30% Byzantine nodes, and 56% accuracy on CIFAR-10 despite its complexity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rising demand for collaborative machine learning and data analytics calls for secure and decentralized data sharing frameworks that balance privacy, trust, and incentives. Existing approaches, including federated learning (FL) and blockchain-based data markets, fall short: FL often depends on trusted aggregators and lacks Byzantine robustness, while blockchain frameworks struggle with computation-intensive training and incentive integration. We present \prot, a decentralized data marketplace that unifies federated learning, blockchain arbitration, and economic incentives into a single framework for privacy-preserving data sharing. \prot\ enables data buyers to submit bid-based requests via blockchain smart contracts, which manage auctions, escrow, and dispute resolution. Computationally intensive training is delegated to \cone\ (\uline{Co}mpute \uline{N}etwork for \uline{E}xecution), an off-chain distributed execution layer. To safeguard against adversarial behavior, \prot\ integrates a modified YODA protocol with exponentially growing execution sets for resilient consensus, and introduces Corrected OSMD to mitigate malicious or low-quality contributions from sellers. All protocols are incentive-compatible, and our game-theoretic analysis establishes honesty as the dominant strategy. We implement \prot\ on Ethereum and evaluate it over benchmark datasets -- MNIST, Fashion-MNIST, and CIFAR-10 -- under varying adversarial settings. \prot\ achieves up to 99\% accuracy on MNIST and 90\% on Fashion-MNIST, with less than 3\% degradation up to 30\% Byzantine nodes, and 56\% accuracy on CIFAR-10 despite its complexity. Our results show that \prot\ ensures privacy, maintains robustness under adversarial conditions, and scales efficiently with the number of participants, making it a practical foundation for real-world decentralized data sharing.
Related papers
- Information-Theoretic Decentralized Secure Aggregation with Collusion Resilience [95.33295072401832]
We study the problem of decentralized secure aggregation (DSA) from an information-theoretic perspective.<n>We characterize the optimal rate region, which specifies the minimum achievable communication and secret key rates for DSA.<n>Our results establish the fundamental performance limits of DSA, providing insights for the design of provably secure and communication-efficient protocols.
arXiv Detail & Related papers (2025-08-01T12:51:37Z) - BotDetect: A Decentralized Federated Learning Framework for Detecting Financial Bots on the EVM Blockchains [3.4636217357968904]
This paper presents a decentralized federated learning (DFL) approach for detecting financial bots within Virtual Machine (EVM)-based blockchains.<n>The proposed framework leverages federated learning, orchestrated through smart contracts, to detect malicious bot behavior.<n> Experimental results show that our DFL framework achieves high detection accuracy while maintaining scalability and robustness.
arXiv Detail & Related papers (2025-01-21T13:15:43Z) - Proof-of-Data: A Consensus Protocol for Collaborative Intelligence [4.362312381717716]
We propose a blockchain-based Byzantine fault-tolerant federated learning framework based on a novel Proof-of-Data (PoD) consensus protocol.<n>PoD is able to enjoy the benefit of learning efficiency and system liveliness from societal-scale PoW-style learning.<n>To mitigate false reward claims by data forgery from Byzantine attacks, a privacy-aware data verification and contribution-based reward allocation mechanism is designed to complete the framework.
arXiv Detail & Related papers (2025-01-06T12:27:59Z) - OmniLytics+: A Secure, Efficient, and Affordable Blockchain Data Market for Machine Learning through Off-Chain Processing [10.055818984984]
We propose OmniLytics+, the first decentralized data market built upon blockchain and smart contract technologies.
The storage and processing overheads are securely offloaded from blockchain verifiers.
Experiments demonstrate the effectiveness of OmniLytics+ in training large ML models in presence of malicious data owner.
arXiv Detail & Related papers (2024-04-17T14:41:14Z) - Enhancing Trust and Privacy in Distributed Networks: A Comprehensive Survey on Blockchain-based Federated Learning [51.13534069758711]
Decentralized approaches like blockchain offer a compelling solution by implementing a consensus mechanism among multiple entities.
Federated Learning (FL) enables participants to collaboratively train models while safeguarding data privacy.
This paper investigates the synergy between blockchain's security features and FL's privacy-preserving model training capabilities.
arXiv Detail & Related papers (2024-03-28T07:08:26Z) - Analyzing Reward Dynamics and Decentralization in Ethereum 2.0: An
Advanced Data Engineering Workflow and Comprehensive Datasets for
Proof-of-Stake Incentives [5.18461573800406]
Smart contract blockchain platform, Proof-of-Stake 2.0, guarantees precise execution of applications without third-party intervention.
Our study collects consensus reward data from the Beacon chain and conducts a comprehensive analysis of reward distribution and evolution.
To evaluate the degree of decentralization in PoS, we apply several inequality indices, including the Shannon entropy, the Gini Index, the Nakamoto Coefficient, and the Herfindahl-Hirschman Index (HHI)
arXiv Detail & Related papers (2024-02-17T02:40:00Z) - Secure Distributed Training at Scale [65.7538150168154]
Training in presence of peers requires specialized distributed training algorithms with Byzantine tolerance.
We propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency.
arXiv Detail & Related papers (2021-06-21T17:00:42Z) - Blockchain Assisted Decentralized Federated Learning (BLADE-FL):
Performance Analysis and Resource Allocation [119.19061102064497]
We propose a decentralized FL framework by integrating blockchain into FL, namely, blockchain assisted decentralized federated learning (BLADE-FL)
In a round of the proposed BLADE-FL, each client broadcasts its trained model to other clients, competes to generate a block based on the received models, and then aggregates the models from the generated block before its local training of the next round.
We explore the impact of lazy clients on the learning performance of BLADE-FL, and characterize the relationship among the optimal K, the learning parameters, and the proportion of lazy clients.
arXiv Detail & Related papers (2021-01-18T07:19:08Z) - Blockchain Assisted Decentralized Federated Learning (BLADE-FL) with
Lazy Clients [124.48732110742623]
We propose a novel framework by integrating blockchain into Federated Learning (FL)
BLADE-FL has a good performance in terms of privacy preservation, tamper resistance, and effective cooperation of learning.
It gives rise to a new problem of training deficiency, caused by lazy clients who plagiarize others' trained models and add artificial noises to conceal their cheating behaviors.
arXiv Detail & Related papers (2020-12-02T12:18:27Z) - BlockFLow: An Accountable and Privacy-Preserving Solution for Federated
Learning [2.0625936401496237]
BlockFLow is an accountable federated learning system that is fully decentralized and privacy-preserving.
Its primary goal is to reward agents proportional to the quality of their contribution while protecting the privacy of the underlying datasets and being resilient to malicious adversaries.
arXiv Detail & Related papers (2020-07-08T02:24:26Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.