Related papers: IAE: Irony-based Adversarial Examples for Sentiment Analysis Systems

IAE: Irony-based Adversarial Examples for Sentiment Analysis Systems

URL: http://arxiv.org/abs/2411.07850v1
Date: Tue, 12 Nov 2024 15:01:47 GMT
Title: IAE: Irony-based Adversarial Examples for Sentiment Analysis Systems
Authors: Xiaoyin Yi, Jiacheng Huang,
Abstract summary: We propose Irony-based Adversarial Examples (IAE), a method that transforms straightforward sentences into ironic ones to create adversarial text. IAE exploits the rhetorical device of irony, where the intended meaning is opposite to the literal interpretation. We demonstrate that the performance of several state-of-the-art deep learning models on sentiment analysis tasks significantly deteriorates when subjected to IAE attacks.
Score: 4.118390893942461
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial examples, which are inputs deliberately perturbed with imperceptible changes to induce model errors, have raised serious concerns for the reliability and security of deep neural networks (DNNs). While adversarial attacks have been extensively studied in continuous data domains such as images, the discrete nature of text presents unique challenges. In this paper, we propose Irony-based Adversarial Examples (IAE), a method that transforms straightforward sentences into ironic ones to create adversarial text. This approach exploits the rhetorical device of irony, where the intended meaning is opposite to the literal interpretation, requiring a deeper understanding of context to detect. The IAE method is particularly challenging due to the need to accurately locate evaluation words, substitute them with appropriate collocations, and expand the text with suitable ironic elements while maintaining semantic coherence. Our research makes the following key contributions: (1) We introduce IAE, a strategy for generating textual adversarial examples using irony. This method does not rely on pre-existing irony corpora, making it a versatile tool for creating adversarial text in various NLP tasks. (2) We demonstrate that the performance of several state-of-the-art deep learning models on sentiment analysis tasks significantly deteriorates when subjected to IAE attacks. This finding underscores the susceptibility of current NLP systems to adversarial manipulation through irony. (3) We compare the impact of IAE on human judgment versus NLP systems, revealing that humans are less susceptible to the effects of irony in text.

Related papers

CL-ISR: A Contrastive Learning and Implicit Stance Reasoning Framework for Misleading Text Detection on Social Media [0.5999777817331317]
This paper proposes a novel framework - CL-ISR (Contrastive Learning and Implicit Stance Reasoning) to improve the detection accuracy of misleading texts on social media.<n>First, we use the contrastive learning algorithm to improve the model's learning ability of semantic differences between truthful and misleading texts.<n>Second, we introduce the implicit stance reasoning module, to explore the potential stance tendencies in the text and their relationships with related topics.
arXiv Detail & Related papers (2025-06-05T14:52:28Z)
Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection [44.05134959039957]
We investigate how sociolinguistic attributes-gender, CEFR proficiency, academic field, and language environment-impact state-of-the-art AI text detectors. Our results reveal significant biases: CEFR proficiency and language environment consistently affected detector accuracy, while gender and academic field showed detector-dependent effects. These findings highlight the crucial need for socially aware AI text detection to avoid unfairly penalizing specific demographic groups.
arXiv Detail & Related papers (2025-02-18T07:49:31Z)
Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks. We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance. Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z)
Irony Detection, Reasoning and Understanding in Zero-shot Learning [0.5755004576310334]
Irony is a powerful figurative language (FL) on social media that can potentially mislead various NLP tasks. Large language models, such as ChatGPT, are increasingly able to capture implicit and contextual information. We propose a prompt engineering design framework IDADP to achieve higher irony detection accuracy, improved understanding of irony, and more effective explanations.
arXiv Detail & Related papers (2025-01-28T12:13:07Z)
Bias-Free Sentiment Analysis through Semantic Blinding and Graph Neural Networks [0.0]
The SProp GNN relies exclusively on syntactic structures and word-level emotional cues to predict emotions in text. By semantically blinding the model to information about specific words, it is robust to biases such as political or gender bias. The SProp GNN shows performance superior to lexicon-based alternatives on two different prediction tasks, and across two languages.
arXiv Detail & Related papers (2024-11-19T13:23:53Z)
Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level [21.003091265006102]
Graph Neural Networks (GNNs) excel across various applications but remain vulnerable to adversarial attacks. In this paper, we pioneer the exploration of Graph Injection Attacks (GIAs) at the text level. We show that text interpretability, a factor previously overlooked at the embedding level, plays a crucial role in attack strength.
arXiv Detail & Related papers (2024-05-26T02:12:02Z)
Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else. It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z)
Hidding the Ghostwriters: An Adversarial Evaluation of AI-Generated Student Essay Detection [29.433764586753956]
Large language models (LLMs) have exhibited remarkable capabilities in text generation tasks. The utilization of these models carries inherent risks, including but not limited to plagiarism, the dissemination of fake news, and issues in educational exercises. This paper aims to bridge this gap by constructing AIG-ASAP, an AI-generated student essay dataset.
arXiv Detail & Related papers (2024-02-01T08:11:56Z)
Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation [66.33340583035374]
We present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation. We demonstrate that 6 state-of-the-art text-based adversarial attacks do not maintain their efficacy after round-trip translation. We introduce an intervention-based solution to this problem, by integrating Machine Translation into the process of adversarial example generation.
arXiv Detail & Related papers (2023-07-24T04:29:43Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Estimating the Adversarial Robustness of Attributions in Text with Transformers [44.745873282080346]
We establish a novel definition of attribution robustness (AR) in text classification, based on Lipschitz continuity. We then propose our novel TransformerExplanationAttack (TEA), a strong adversary that provides a tight estimation for attribution in text classification.
arXiv Detail & Related papers (2022-12-18T20:18:59Z)
Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label. Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm. Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z)
ADC: Adversarial attacks against object Detection that evade Context consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples. We propose an adaptive framework to generate examples that subvert such defenses. Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z)
Randomized Substitution and Vote for Textual Adversarial Example Detection [6.664295299367366]
A line of work has shown that natural text processing models are vulnerable to adversarial examples. We propose a novel textual adversarial example detection method, termed Randomized Substitution and Vote (RS&V) Empirical evaluations on three benchmark datasets demonstrate that RS&V could detect the textual adversarial examples more successfully than the existing detection methods.
arXiv Detail & Related papers (2021-09-13T04:17:58Z)
Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.