Related papers: On the Perturbed States for Transformed Input-robust Reinforcement Learning

On the Perturbed States for Transformed Input-robust Reinforcement Learning

URL: http://arxiv.org/abs/2408.00023v2
Date: Fri, 2 Aug 2024 06:05:19 GMT
Title: On the Perturbed States for Transformed Input-robust Reinforcement Learning
Authors: Tung M. Luu, Haeyong Kang, Tri Ton, Thanh Nguyen, Chang D. Yoo,
Abstract summary: Reinforcement Learning (RL) agents exhibit vulnerability to adversarial perturbations in input observations during deployment. We introduce two principles for applying transformation-based defenses in learning robust RL agents. Experiments on multiple MuJoCo environments demonstrate that input transformation-based defenses defend against several adversaries.
Score: 24.11603621594292
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its real-world deployment. To alleviate the challenging point, prior works focus on developing robust training-based procedures, encompassing efforts to fortify the deep neural network component's robustness or subject the agent to adversarial training against potent attacks. In this work, we propose a novel method referred to as Transformed Input-robust RL (TIRL), which explores another avenue to mitigate the impact of adversaries by employing input transformation-based defenses. Specifically, we introduce two principles for applying transformation-based defenses in learning robust RL agents: (1) autoencoder-styled denoising to reconstruct the original state and (2) bounded transformations (bit-depth reduction and vector quantization (VQ)) to achieve close transformed inputs. The transformations are applied to the state before feeding it into the policy network. Extensive experiments on multiple MuJoCo environments demonstrate that input transformation-based defenses, i.e., VQ, defend against several adversaries in the state observations. The official code is available at https://github.com/tunglm2203/tirl

Related papers

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming [60.554146386198376]
Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat. We propose REFINE, an inversion-free backdoor defense method based on model reprogramming.
arXiv Detail & Related papers (2025-02-22T07:29:12Z)
Robust and Transferable Backdoor Attacks Against Deep Image Compression With Selective Frequency Prior [118.92747171905727]
This paper introduces a novel frequency-based trigger injection model for launching backdoor attacks with multiple triggers on learned image compression models. We design attack objectives tailored to diverse scenarios, including: 1) degrading compression quality in terms of bit-rate and reconstruction accuracy; 2) targeting task-driven measures like face recognition and semantic segmentation. Experiments show that our trigger injection models, combined with minor modifications to encoder parameters, successfully inject multiple backdoors and their triggers into a single compression model.
arXiv Detail & Related papers (2024-12-02T15:58:40Z)
Protecting Feed-Forward Networks from Adversarial Attacks Using Predictive Coding [0.20718016474717196]
An adversarial example is a modified input image designed to cause a Machine Learning (ML) model to make a mistake. This study presents a practical and effective solution -- using predictive coding networks (PCnets) as an auxiliary step for adversarial defence.
arXiv Detail & Related papers (2024-10-31T21:38:05Z)
Mitigating Adversarial Perturbations for Deep Reinforcement Learning via Vector Quantization [18.56608399174564]
Well-performing reinforcement learning (RL) agents often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the real world. In this work, we study an input transformation-based defense for RL.
arXiv Detail & Related papers (2024-10-04T12:41:54Z)
Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z)
GenFighter: A Generative and Evolutive Textual Attack Removal [6.044610337297754]
Adrial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP) This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution. We show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics.
arXiv Detail & Related papers (2024-04-17T16:32:13Z)
Belief-Enriched Pessimistic Q-Learning against Adversarial State Perturbations [5.076419064097735]
Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage. Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy. We propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states.
arXiv Detail & Related papers (2024-03-06T20:52:49Z)
Pre-trained Trojan Attacks for Visual Recognition [106.13792185398863]
Pre-trained vision models (PVMs) have become a dominant component due to their exceptional performance when fine-tuned for downstream tasks. We propose the Pre-trained Trojan attack, which embeds backdoors into a PVM, enabling attacks across various downstream vision tasks. We highlight the challenges posed by cross-task activation and shortcut connections in successful backdoor attacks.
arXiv Detail & Related papers (2023-12-23T05:51:40Z)
Position Prediction as an Effective Pretraining Strategy [20.925906203643883]
We propose a novel, but surprisingly simple alternative to content reconstruction-- that of predicting locations from content, without providing positional information for it. Our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods.
arXiv Detail & Related papers (2022-07-15T17:10:48Z)
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training. We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs. We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.