Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2402.09695v2
- Date: Wed, 23 Oct 2024 19:31:22 GMT
- Title: Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning
- Authors: Yinglun Xu, Rohan Gumaste, Gagandeep Singh,
- Abstract summary: We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks.
We propose the first universal black-box reward poisoning attack in the general offline RL setting.
- Score: 4.629358641630161
- License:
- Abstract: We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks. We consider a black-box threat model where the attacker is entirely oblivious to the learning algorithm, and its budget is limited by constraining the amount of corruption at each data point and the total perturbation. We require the attack to be universally efficient against any efficient algorithms that might be used by the agent. We propose an attack strategy called the `policy contrast attack.' The idea is to find low- and high-performing policies covered by the dataset and make them appear to be high- and low-performing to the agent, respectively. To the best of our knowledge, we propose the first universal black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and empirically show that our attack is efficient against current state-of-the-art offline RL algorithms in different learning datasets.
Related papers
- Adversarial Attacks on Online Learning to Rank with Stochastic Click
Models [34.725468803108754]
We propose the first study of adversarial attacks on online learning to rank.
The goal of the adversary is to misguide the online learning to rank algorithm to place the target item on top of the ranking list linear times to time horizon $T$ with a sublinear attack cost.
arXiv Detail & Related papers (2023-05-30T17:05:49Z) - Adversarial Attacks on Online Learning to Rank with Click Feedback [18.614785011987756]
Online learning to rank is a sequential decision-making problem where a learning agent selects an ordered list of items and receives feedback through user clicks.
This paper studies attack strategies against multiple variants of OLTR.
We propose a general attack strategy against any algorithm under the general click model.
arXiv Detail & Related papers (2023-05-26T16:28:26Z) - Black-Box Targeted Reward Poisoning Attack Against Online Deep
Reinforcement Learning [2.3526458707956643]
We propose the first black-box targeted attack against online deep reinforcement learning through reward poisoning during training time.
Our attack is applicable to general environments with unknown dynamics learned by unknown algorithms.
arXiv Detail & Related papers (2023-05-18T03:37:29Z) - Adversarial Attacks on Adversarial Bandits [10.891819703383408]
We show that the attacker is able to mislead any no-regret adversarial bandit algorithm into selecting a suboptimal target arm.
This result implies critical security concern in real-world bandit-based systems.
arXiv Detail & Related papers (2023-01-30T00:51:39Z) - Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning [6.414910263179327]
We study reward poisoning attacks on online deep reinforcement learning (DRL)
We demonstrate the intrinsic vulnerability of state-of-the-art DRL algorithms by designing a general, black-box reward poisoning framework called adversarial MDP attacks.
Our results show that our attacks efficiently poison agents learning in several popular classical control and MuJoCo environments.
arXiv Detail & Related papers (2022-05-30T04:07:19Z) - Investigating Top-$k$ White-Box and Transferable Black-box Attack [75.13902066331356]
We show that stronger attack actually transfers better for the general top-$k$ ASR indicated by the interest class rank (ICR) after attack.
We propose a new normalized CE loss that guides the logit to be updated in the direction of implicitly maximizing its rank distance from the ground-truth class.
arXiv Detail & Related papers (2022-03-30T15:02:27Z) - Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks.
GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases.
We first rigorously analyze a deterministic variant of the online threat model.
We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
We present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency.
RayS attack can also be used as a sanity check for possible "falsely robust" models.
arXiv Detail & Related papers (2020-06-23T07:01:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.