Adversarial Semantic Collisions
- URL: http://arxiv.org/abs/2011.04743v1
- Date: Mon, 9 Nov 2020 20:42:01 GMT
- Title: Adversarial Semantic Collisions
- Authors: Congzheng Song, Alexander M. Rush, Vitaly Shmatikov
- Abstract summary: We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models.
We develop gradient-based approaches for generating semantic collisions.
We show how to generate semantic collisions that evade perplexity-based filtering.
- Score: 129.55896108684433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study semantic collisions: texts that are semantically unrelated but
judged as similar by NLP models. We develop gradient-based approaches for
generating semantic collisions and demonstrate that state-of-the-art models for
many tasks which rely on analyzing the meaning and similarity of texts--
including paraphrase identification, document retrieval, response suggestion,
and extractive summarization-- are vulnerable to semantic collisions. For
example, given a target query, inserting a crafted collision into an irrelevant
document can shift its retrieval rank from 1000 to top 3. We show how to
generate semantic collisions that evade perplexity-based filtering and discuss
other potential mitigations. Our code is available at
https://github.com/csong27/collision-bert.
Related papers
- Disentangling Dense Embeddings with Sparse Autoencoders [0.0]
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks.
We present one of the first applications of SAEs to dense text embeddings from large language models.
We show that the resulting sparse representations maintain semantic fidelity while offering interpretability.
arXiv Detail & Related papers (2024-08-01T15:46:22Z) - HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack
on Text [40.58680960214544]
Black-box hard-label adversarial attack on text is a practical and challenging task.
We propose a framework to generate high quality adversarial examples under the black-box hard-label attack scenarios, named HQA-Attack.
arXiv Detail & Related papers (2024-02-02T10:06:43Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Randomized Substitution and Vote for Textual Adversarial Example
Detection [6.664295299367366]
A line of work has shown that natural text processing models are vulnerable to adversarial examples.
We propose a novel textual adversarial example detection method, termed Randomized Substitution and Vote (RS&V)
Empirical evaluations on three benchmark datasets demonstrate that RS&V could detect the textual adversarial examples more successfully than the existing detection methods.
arXiv Detail & Related papers (2021-09-13T04:17:58Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media [93.51739200834837]
We propose a dataset where both image and text are unmanipulated but mismatched.
We introduce several strategies for automatic retrieval of suitable images for the given captions.
Our large-scale automatically generated NewsCLIPpings dataset requires models to jointly analyze both modalities.
arXiv Detail & Related papers (2021-04-13T01:53:26Z) - Logic Constrained Pointer Networks for Interpretable Textual Similarity [11.142649867439406]
We introduce a novel pointer network based model with a sentinel gating function to align constituent chunks.
We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional.
The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task.
arXiv Detail & Related papers (2020-07-15T13:01:44Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - Human Correspondence Consensus for 3D Object Semantic Understanding [56.34297279246823]
In this paper, we introduce a new dataset named CorresPondenceNet.
Based on this dataset, we are able to learn dense semantic embeddings with a novel geodesic consistency loss.
We show that CorresPondenceNet could not only boost fine-grained understanding of heterogeneous objects but also cross-object registration and partial object matching.
arXiv Detail & Related papers (2019-12-29T04:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.