RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning
- URL: http://arxiv.org/abs/2403.11082v1
- Date: Sun, 17 Mar 2024 04:29:45 GMT
- Title: RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning
- Authors: Javad Rafiei Asl, Prajwal Panzade, Eduardo Blanco, Daniel Takabi, Zhipeng Cai,
- Abstract summary: RobustSentEmbed is a self-supervised sentence embedding framework designed to improve robustness in diverse text representation tasks.
Our framework achieves a significant reduction in the success rate of various adversarial attacks, notably reducing the BERTAttack success rate by almost half.
- Score: 11.347789553984741
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Pre-trained language models (PLMs) have consistently demonstrated outstanding performance across a diverse spectrum of natural language processing tasks. Nevertheless, despite their success with unseen data, current PLM-based representations often exhibit poor robustness in adversarial settings. In this paper, we introduce RobustSentEmbed, a self-supervised sentence embedding framework designed to improve both generalization and robustness in diverse text representation tasks and against a diverse set of adversarial attacks. Through the generation of high-risk adversarial perturbations and their utilization in a novel objective function, RobustSentEmbed adeptly learns high-quality and robust sentence embeddings. Our experiments confirm the superiority of RobustSentEmbed over state-of-the-art representations. Specifically, Our framework achieves a significant reduction in the success rate of various adversarial attacks, notably reducing the BERTAttack success rate by almost half (from 75.51\% to 38.81\%). The framework also yields improvements of 1.59\% and 0.23\% in semantic textual similarity tasks and various transfer tasks, respectively.
Related papers
- White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability [8.896239176376488]
This work addresses the challenge of achieving zero-shot adversarial robustness while preserving zero-shot generalization in large-scale foundation models.
We propose a novel Text-Image Mutual Awareness (TIMA) method that strikes a balance between zero-shot adversarial robustness and generalization.
arXiv Detail & Related papers (2024-05-27T22:10:17Z) - Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention.
Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision.
We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z) - Lost In Translation: Generating Adversarial Examples Robust to
Round-Trip Translation [66.33340583035374]
We present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation.
We demonstrate that 6 state-of-the-art text-based adversarial attacks do not maintain their efficacy after round-trip translation.
We introduce an intervention-based solution to this problem, by integrating Machine Translation into the process of adversarial example generation.
arXiv Detail & Related papers (2023-07-24T04:29:43Z) - Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks.
We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples.
We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z) - Hear No Evil: Towards Adversarial Robustness of Automatic Speech
Recognition via Multi-Task Learning [13.735883484044166]
We investigate the impact of performing multi-task learning on the adversarial robustness of ASR models in the speech domain.
Our approach shows considerable absolute improvements in adversarially targeted WER ranging from 17.25 up to 59.90.
Ours is the first in-depth study that uncovers adversarial robustness gains from multi-task learning for ASR.
arXiv Detail & Related papers (2022-04-05T17:40:19Z) - Perturbations in the Wild: Leveraging Human-Written Text Perturbations
for Realistic Adversarial Attack and Defense [19.76930957323042]
ANTHRO inductively extracts over 600K human-written text perturbations in the wild and leverages them for realistic adversarial attack.
We find that adversarial texts generated by ANTHRO achieve the best trade-off between (1) attack success rate, (2) semantic preservation of the original text, and (3) stealthiness--i.e. indistinguishable from human writings.
arXiv Detail & Related papers (2022-03-19T16:00:01Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.