Exploring Adversarial Robustness of Deep State Space Models
- URL: http://arxiv.org/abs/2406.05532v2
- Date: Wed, 09 Oct 2024 02:28:56 GMT
- Title: Exploring Adversarial Robustness of Deep State Space Models
- Authors: Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, Bowen Zhou,
- Abstract summary: Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR)
We show that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly better trade-off between robustness and generalization.
We propose a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of Robust Overfitting (RO)
- Score: 26.650751659034782
- License:
- Abstract: Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs remains unclear. While many enhancements in SSM components, such as integrating Attention mechanisms and expanding to data-dependent SSM parameterizations, have brought significant gains in Standard Training (ST) settings, their potential benefits in AT remain unexplored. To investigate this, we evaluate existing structural variants of SSMs with AT to assess their AR performance. We observe that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly better trade-off between robustness and generalization for SSMs in AT compared to other components. Nonetheless, the integration of Attention also leads to Robust Overfitting (RO) issues. To understand these phenomena, we empirically and theoretically analyze the output error of SSMs under AP. We find that fixed-parameterized SSMs have output error bounds strictly related to their parameters, limiting their AT benefits, while input-dependent SSMs may face the problem of error explosion. Furthermore, we show that the Attention component effectively scales the output error of SSMs during training, enabling them to benefit more from AT, but at the cost of introducing RO due to its high model complexity. Inspired by this, we propose a simple and effective Adaptive Scaling (AdS) mechanism that brings AT performance close to Attention-integrated SSMs without introducing the issue of RO. Our code is available at https://github.com/Biqing-Qi/Exploring-Adversarial-Robustness-of-Deep-State-Space-Models.git.
Related papers
- Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing [56.66469232740998]
We show that Structured State Space Models (SSMs) are inherently limited by strong recency bias.
This bias impairs the models' ability to recall distant information and introduces robustness issues.
We propose to polarize two channels of the state transition matrices in SSMs, setting them to zero and one, respectively, simultaneously addressing recency bias and over-smoothing.
arXiv Detail & Related papers (2024-12-31T22:06:39Z) - Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure.
In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations.
This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z) - Towards Evaluating the Robustness of Visual State Space Models [63.14954591606638]
Vision State Space Models (VSSMs) have demonstrated remarkable performance in visual perception tasks.
However, their robustness under natural and adversarial perturbations remains a critical concern.
We present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios.
arXiv Detail & Related papers (2024-06-13T17:59:44Z) - The Expressive Capacity of State Space Models: A Formal Language Perspective [0.8948475969696075]
recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers.
We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs.
arXiv Detail & Related papers (2024-05-27T17:46:57Z) - HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - Spatial Attention-based Distribution Integration Network for Human Pose
Estimation [0.8052382324386398]
We present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization.
Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM)
Our model obtained a remarkable $92.10%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
arXiv Detail & Related papers (2023-11-09T12:43:01Z) - Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence.
We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN)
We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z) - ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism.
In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z) - Improving Out-of-Distribution Generalization by Adversarial Training
with Structured Priors [17.936426699670864]
We show that sample-wise Adversarial Training (AT) has limited improvement on Out-of-Distribution (OOD) generalization.
We propose two AT variants with low-rank structures to train OOD-robust models.
Our proposed approaches outperform Empirical Risk Minimization (ERM) and sample-wise AT.
arXiv Detail & Related papers (2022-10-13T07:37:42Z) - Rethinking Uncertainty in Deep Learning: Whether and How it Improves
Robustness [20.912492996647888]
adversarial training (AT) suffers from poor performance both on clean examples and under other types of attacks.
Regularizers that encourage uncertain outputs, such as entropy (EntM) and label smoothing (LS) can maintain accuracy on clean examples and improve performance under weak attacks.
In this paper, we revisit uncertainty promotion regularizers, including EntM and LS, in the field of adversarial learning.
arXiv Detail & Related papers (2020-11-27T03:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.