Robust Encodings: A Framework for Combating Adversarial Typos
- URL: http://arxiv.org/abs/2005.01229v1
- Date: Mon, 4 May 2020 01:28:18 GMT
- Title: Robust Encodings: A Framework for Combating Adversarial Typos
- Authors: Erik Jones, Robin Jia, Aditi Raghunathan, and Percy Liang
- Abstract summary: NLP systems are easily fooled by small perturbations of inputs.
Existing procedures to defend against such perturbations provide guaranteed robustness to worst-case attacks.
We introduce robust encodings (RobEn) that confer guaranteed robustness without making compromises on model architecture.
- Score: 85.70270979772388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite excellent performance on many tasks, NLP systems are easily fooled by
small adversarial perturbations of inputs. Existing procedures to defend
against such perturbations are either (i) heuristic in nature and susceptible
to stronger attacks or (ii) provide guaranteed robustness to worst-case
attacks, but are incompatible with state-of-the-art models like BERT. In this
work, we introduce robust encodings (RobEn): a simple framework that confers
guaranteed robustness, without making compromises on model architecture. The
core component of RobEn is an encoding function, which maps sentences to a
smaller, discrete space of encodings. Systems using these encodings as a
bottleneck confer guaranteed robustness with standard training, and the same
encodings can be used across multiple tasks. We identify two desiderata to
construct robust encoding functions: perturbations of a sentence should map to
a small set of encodings (stability), and models using encodings should still
perform well (fidelity). We instantiate RobEn to defend against a large family
of adversarial typos. Across six tasks from GLUE, our instantiation of RobEn
paired with BERT achieves an average robust accuracy of 71.3% against all
adversarial typos in the family considered, while previous work using a
typo-corrector achieves only 35.3% accuracy against a simple greedy attack.
Related papers
- Unitary Multi-Margin BERT for Robust Natural Language Processing [0.0]
Recent developments in adversarial attacks on deep learning leave many mission-critical natural language processing (NLP) systems at risk of exploitation.
To address the lack of computationally efficient adversarial defense methods, this paper reports a novel, universal technique that drastically improves the robustness of Bidirectional Representations from Transformers (BERT) by combining the unitary weights with the multi-margin loss.
Our model, the unitary multi-margin BERT (UniBERT), boosts post-attack classification accuracies significantly by 5.3% to 73.8% while maintaining competitive pre-attack accuracies.
arXiv Detail & Related papers (2024-10-16T17:30:58Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - Fully Randomized Pointers [7.1754940591892735]
We propose Fully Randomized Pointers (FRP) as a stronger memory error defense that is resistant to even brute force attacks.
The key idea is to fully randomize pointer bits -- as much as possible -- while also preserving binary compatibility.
We show that FRP is secure, practical, and compatible at the binary level, while a hardware implementation can achieve low performance overheads.
arXiv Detail & Related papers (2024-05-21T05:54:27Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - Low-Weight High-Distance Error Correcting Fermionic Encodings [0.0]
We search for practical fermion-to-qubit encodings with error correcting properties.
We report multiple promising high-distance encodings which significantly improve the weights of stabilizers and logical operators.
arXiv Detail & Related papers (2024-02-23T15:32:57Z) - Speculative Contrastive Decoding [55.378200871224074]
Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
arXiv Detail & Related papers (2023-11-15T14:15:30Z) - Anti-LM Decoding for Zero-shot In-context Machine Translation [59.26037416204157]
This work introduces an Anti-Language Model objective with a decay factor designed to address the weaknesses of In-context Machine Translation.
We conduct experiments across 3 model types and sizes, 3 language directions, and for both greedy decoding and beam search.
arXiv Detail & Related papers (2023-11-14T17:09:43Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - On the Adversarial Robustness of Generative Autoencoders in the Latent
Space [22.99128324197949]
We provide the first study on the adversarial robustness of generative autoencoders in the latent space.
Specifically, we empirically demonstrate the latent vulnerability of popular generative autoencoders through attacks in the latent space.
We identify a potential trade-off between the adversarial robustness and the degree of the disentanglement of the latent codes.
arXiv Detail & Related papers (2023-07-05T10:53:49Z) - Double Backpropagation for Training Autoencoders against Adversarial
Attack [15.264115499966413]
This paper focuses on the adversarial attack on autoencoders.
We propose to adopt double backpropagation (DBP) to secure autoencoder such as VAE and DRAW.
arXiv Detail & Related papers (2020-03-04T05:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.