Related papers: Advantage-based Temporal Attack in Reinforcement Learning

Advantage-based Temporal Attack in Reinforcement Learning

URL: http://arxiv.org/abs/2602.19582v1
Date: Mon, 23 Feb 2026 08:08:23 GMT
Title: Advantage-based Temporal Attack in Reinforcement Learning
Authors: Shenghong He,
Abstract summary: AAT captures dependencies between historical information from different time periods and the current state.<n>AAT introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state.<n>Experiments demonstrate that AAT matches or surpasses mainstream adversarial attack baselines on Atari, DeepMind Control Suite and Google football tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extensive research demonstrates that Deep Reinforcement Learning (DRL) models are susceptible to adversarially constructed inputs (i.e., adversarial examples), which can mislead the agent to take suboptimal or unsafe actions. Recent methods improve attack effectiveness by leveraging future rewards to guide adversarial perturbation generation over sequential time steps (i.e., reward-based attacks). However, these methods are unable to capture dependencies between different time steps in the perturbation generation process, resulting in a weak temporal correlation between the current perturbation and previous perturbations.In this paper, we propose a novel method called Advantage-based Adversarial Transformer (AAT), which can generate adversarial examples with stronger temporal correlations (i.e., time-correlated adversarial examples) to improve the attack performance. AAT employs a multi-scale causal self-attention (MSCSA) mechanism to dynamically capture dependencies between historical information from different time periods and the current state, thus enhancing the correlation between the current perturbation and the previous perturbation. Moreover, AAT introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state and guides the generation process toward high-performance adversarial examples by sampling high-advantage regions. Extensive experiments demonstrate that the performance of AAT matches or surpasses mainstream adversarial attack baselines on Atari, DeepMind Control Suite and Google football tasks.

Related papers

Temporally Unified Adversarial Perturbations for Time Series Forecasting [1.9116784879310027]
We introduce Temporally Unified Adversarial Perturbations (TUAPs) to ensure identical perturbations for each timestamp across all overlapping samples.<n>We also propose a novel Timestamp-wise Gradient Accumulation Method (TGAM) that provides a modular and efficient approach to effectively generate TUAPs.<n>Our proposed method significantly outperforms baselines in both white-box and black-box transfer attack scenarios.
arXiv Detail & Related papers (2026-02-12T13:37:45Z)
TFLAG:Towards Practical APT Detection via Deviation-Aware Learning on Temporal Provenance Graph [7.676383564795898]
Advanced Persistent Threat (APT) have grown increasingly complex and concealed.<n>Recent studies have incorporated graph learning techniques to extract detailed information from provenance graphs.<n>We introduce TFLAG, an advanced anomaly detection framework.
arXiv Detail & Related papers (2025-01-13T01:08:06Z)
Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)<n>To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z)
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling [51.38330727868982]
We show how action chunking impacts the divergence between a learner and a demonstrator.<n>We propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation.<n>Our method boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization [50.43319961935526]
Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour.
arXiv Detail & Related papers (2024-04-11T22:43:44Z)
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z)
Improving Adversarial Transferability by Stable Diffusion [36.97548018603747]
adversarial examples introduce imperceptible perturbations to benign samples, deceiving predictions. Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving predictions. We introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images.
arXiv Detail & Related papers (2023-11-18T09:10:07Z)
SWAP: Exploiting Second-Ranked Logits for Adversarial Attacks on Time Series [11.356275885051442]
Time series classification (TSC) has emerged as a critical task in various domains. Deep neural models have shown superior performance in TSC tasks. TSC models are vulnerable to adversarial attacks. We propose SWAP, a novel attacking method for TSC models.
arXiv Detail & Related papers (2023-09-06T06:17:35Z)
Improving Adversarial Transferability via Intermediate-level Perturbation Decay [79.07074710460012]
We develop a novel intermediate-level method that crafts adversarial examples within a single stage of optimization. Experimental results show that it outperforms state-of-the-arts by large margins in attacking various victim models.
arXiv Detail & Related papers (2023-04-26T09:49:55Z)
Fuzziness-tuned: Improving the Transferability of Adversarial Examples [18.880398046794138]
adversarial examples have been widely used to enhance the robustness of the training models on deep neural networks. The attack success rate of the transfer-based attacks on the surrogate model is much higher than that on victim model under the low attack strength. A fuzziness-tuned method is proposed to ensure the generated adversarial examples can effectively skip out of the fuzzy domain.
arXiv Detail & Related papers (2023-03-17T16:00:18Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
Robust Tracking against Adversarial Attacks [69.59717023941126]
We first attempt to generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks. We apply the proposed adversarial attack and defense approaches to state-of-the-art deep tracking algorithms.
arXiv Detail & Related papers (2020-07-20T08:05:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.