OmniLytics: A Blockchain-based Secure Data Market for Decentralized
Machine Learning
- URL: http://arxiv.org/abs/2107.05252v1
- Date: Mon, 12 Jul 2021 08:28:15 GMT
- Title: OmniLytics: A Blockchain-based Secure Data Market for Decentralized
Machine Learning
- Authors: Jiacheng Liang, Wensi Jiang and Songze Li
- Abstract summary: We propose OmniLytics, a secure data trading marketplace for machine learning applications.
Data owners can contribute their private data to collectively train a ML model requested by some model owners, and get compensated for data contribution.
OmniLytics enables such model training while simultaneously providing 1) model security against curious data owners; 2) data security against curious model and data owners; 3) resilience to malicious data owners who provide faulty results to poison model training; and 4) resilience to malicious model owner who intents to evade the payment.
- Score: 3.9256804549871553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose OmniLytics, a blockchain-based secure data trading marketplace for
machine learning applications. Utilizing OmniLytics, many distributed data
owners can contribute their private data to collectively train a ML model
requested by some model owners, and get compensated for data contribution.
OmniLytics enables such model training while simultaneously providing 1) model
security against curious data owners; 2) data security against curious model
and data owners; 3) resilience to malicious data owners who provide faulty
results to poison model training; and 4) resilience to malicious model owner
who intents to evade the payment. OmniLytics is implemented as a smart contract
on the Ethereum blockchain to guarantee the atomicity of payment. In
OmniLytics, a model owner publishes encrypted initial model on the contract,
over which the participating data owners compute gradients using their private
data, and securely aggregate the gradients through the contract. Finally, the
contract reimburses the data owners, and the model owner decrypts the
aggregated model update. We implement a working prototype of OmniLytics on
Ethereum, and perform extensive experiments to measure its gas cost and
execution time under various parameter combinations, demonstrating its high
computation and cost efficiency and strong practicality.
Related papers
- Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models [146.18107944503436]
Molmo is a new family of VLMs that are state-of-the-art in their class of openness.
Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators.
We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future.
arXiv Detail & Related papers (2024-09-25T17:59:51Z) - OmniLytics+: A Secure, Efficient, and Affordable Blockchain Data Market for Machine Learning through Off-Chain Processing [10.055818984984]
We propose OmniLytics+, the first decentralized data market built upon blockchain and smart contract technologies.
The storage and processing overheads are securely offloaded from blockchain verifiers.
Experiments demonstrate the effectiveness of OmniLytics+ in training large ML models in presence of malicious data owner.
arXiv Detail & Related papers (2024-04-17T14:41:14Z) - ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain [1.433758865948252]
We introduce Machine Learning to Contract (ML2SC), a PyTorch to Solidity translator that can translate multi-layer perceptron (MLP) models written in Pytorch to Solidity smart contract versions.
After deploying the generated smart contract, we can train our models off-chain using PyTorch and then further transfer the acquired weights and biases to the smart contract using a function call.
arXiv Detail & Related papers (2024-03-28T23:55:10Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - Proof-of-Contribution-Based Design for Collaborative Machine Learning on
Blockchain [23.641069086247573]
Our goal is to design a data marketplace for such decentralized collaborative/federated learning applications.
In our design, we utilize a distributed storage infrastructure and an aggregator aside from the project owner and the trainers.
We execute the proposed data market through a blockchain smart contract.
arXiv Detail & Related papers (2023-02-27T18:43:11Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Citadel: Protecting Data Privacy and Model Confidentiality for
Collaborative Learning with SGX [5.148111464782033]
This paper presents Citadel, a scalable collaborative ML system that protects the privacy of both data owner and model owner in untrusted infrastructures.
C Citadel performs distributed training across multiple training enclaves running on behalf of data owners and an aggregator enclave on behalf of the model owner.
Compared with the existing SGX-protected training systems, Citadel enables better scalability and stronger privacy guarantees for collaborative ML.
arXiv Detail & Related papers (2021-05-04T04:17:29Z) - Blockchain Assisted Decentralized Federated Learning (BLADE-FL) with
Lazy Clients [124.48732110742623]
We propose a novel framework by integrating blockchain into Federated Learning (FL)
BLADE-FL has a good performance in terms of privacy preservation, tamper resistance, and effective cooperation of learning.
It gives rise to a new problem of training deficiency, caused by lazy clients who plagiarize others' trained models and add artificial noises to conceal their cheating behaviors.
arXiv Detail & Related papers (2020-12-02T12:18:27Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z) - Analysis of Models for Decentralized and Collaborative AI on Blockchain [0.0]
We evaluate the use of several models and configurations in order to propose best practices when using the Self-Assessment incentive mechanism.
We compare several factors for each dataset when models are hosted in smart contracts on a public blockchain.
arXiv Detail & Related papers (2020-09-14T21:38:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.