Data Poisoning Attacks and Defenses to Crowdsourcing Systems
- URL: http://arxiv.org/abs/2102.09171v1
- Date: Thu, 18 Feb 2021 06:03:48 GMT
- Title: Data Poisoning Attacks and Defenses to Crowdsourcing Systems
- Authors: Minghong Fang, Minghao Sun, Qi Li, Neil Zhenqiang Gong, Jin Tian, Jia
Liu
- Abstract summary: We show that crowdsourcing is vulnerable to data poisoning attacks.
malicious clients provide carefully crafted data to corrupt the aggregated data.
We propose two defenses to reduce the impact of malicious clients.
- Score: 26.147716118854614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge of big data analytics is how to collect a large volume of
(labeled) data. Crowdsourcing aims to address this challenge via aggregating
and estimating high-quality data (e.g., sentiment label for text) from
pervasive clients/users. Existing studies on crowdsourcing focus on designing
new methods to improve the aggregated data quality from unreliable/noisy
clients. However, the security aspects of such crowdsourcing systems remain
under-explored to date. We aim to bridge this gap in this work. Specifically,
we show that crowdsourcing is vulnerable to data poisoning attacks, in which
malicious clients provide carefully crafted data to corrupt the aggregated
data. We formulate our proposed data poisoning attacks as an optimization
problem that maximizes the error of the aggregated data. Our evaluation results
on one synthetic and two real-world benchmark datasets demonstrate that the
proposed attacks can substantially increase the estimation errors of the
aggregated data. We also propose two defenses to reduce the impact of malicious
clients. Our empirical results show that the proposed defenses can
substantially reduce the estimation errors of the data poisoning attacks.
Related papers
- Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server.
Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples.
We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - Refiner: Data Refining against Gradient Leakage Attacks in Federated
Learning [28.76786159247595]
gradient leakage attacks exploit clients' uploaded gradients to reconstruct their sensitive data.
In this paper, we explore a novel defensive paradigm that departs from conventional gradient perturbation approaches.
We design Refiner that jointly optimize two metrics for privacy protection and performance maintenance.
arXiv Detail & Related papers (2022-12-05T05:36:15Z) - Try to Avoid Attacks: A Federated Data Sanitization Defense for
Healthcare IoMT Systems [4.024567343465081]
The distribution of IoMT has the risk of protection from data poisoning attacks.
Poisoned data can be fabricated by falsifying medical data.
This paper introduces a Federated Data Sanitization Defense, a novel approach to protect the system from data poisoning attacks.
arXiv Detail & Related papers (2022-11-03T05:21:39Z) - Concealing Sensitive Samples against Gradient Leakage in Federated
Learning [41.43099791763444]
Federated Learning (FL) is a distributed learning paradigm that enhances users privacy by eliminating the need for clients to share raw, private data with the server.
Recent studies expose the vulnerability of FL to model inversion attacks, where adversaries reconstruct users private data via eavesdropping on the shared gradient information.
We present a simple, yet effective defense strategy that obfuscates the gradients of the sensitive data with concealed samples.
arXiv Detail & Related papers (2022-09-13T04:19:35Z) - Robust Trajectory Prediction against Adversarial Attacks [84.10405251683713]
Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving systems.
These methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions.
In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks.
arXiv Detail & Related papers (2022-07-29T22:35:05Z) - Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data.
Data poisoning attacks have been proposed as a bulwark against scraping.
We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z) - Gradient-based Data Subversion Attack Against Binary Classifiers [9.414651358362391]
In this work, we focus on label contamination attack in which an attacker poisons the labels of data to compromise the functionality of the system.
We exploit the gradients of a differentiable convex loss function with respect to the predicted label as a warm-start and formulate different strategies to find a set of data instances to contaminate.
Our experiments show that the proposed approach outperforms the baselines and is computationally efficient.
arXiv Detail & Related papers (2021-05-31T09:04:32Z) - Curse or Redemption? How Data Heterogeneity Affects the Robustness of
Federated Learning [51.15273664903583]
Data heterogeneity has been identified as one of the key features in federated learning but often overlooked in the lens of robustness to adversarial attacks.
This paper focuses on characterizing and understanding its impact on backdooring attacks in federated learning through comprehensive experiments using synthetic and the LEAF benchmarks.
arXiv Detail & Related papers (2021-02-01T06:06:21Z) - Auto-weighted Robust Federated Learning with Corrupted Data Sources [7.475348174281237]
Federated learning provides a communication-efficient and privacy-preserving training process.
Standard federated learning techniques that naively minimize an average loss function are vulnerable to data corruptions.
We propose Auto-weighted Robust Federated Learning (arfl) to provide robustness against corrupted data sources.
arXiv Detail & Related papers (2021-01-14T21:54:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.