The Sensitivity of Word Embeddings-based Author Detection Models to
Semantic-preserving Adversarial Perturbations
- URL: http://arxiv.org/abs/2102.11917v1
- Date: Tue, 23 Feb 2021 19:55:45 GMT
- Title: The Sensitivity of Word Embeddings-based Author Detection Models to
Semantic-preserving Adversarial Perturbations
- Authors: Jeremiah Duncan, Fabian Fallas, Chris Gropp, Emily Herron, Maria
Mahbub, Paula Olaya, Eduardo Ponce, Tabitha K. Samuel, Daniel Schultz,
Sudarshan Srinivasan, Maofeng Tang, Viktor Zenkov, Quan Zhou, Edmon Begoli
- Abstract summary: Authorship analysis is an important subject in the field of natural language processing.
This paper explores the limitations and sensitiveness of established approaches to adversarial manipulations of inputs.
- Score: 3.7552532139404797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Authorship analysis is an important subject in the field of natural language
processing. It allows the detection of the most likely writer of articles,
news, books, or messages. This technique has multiple uses in tasks related to
authorship attribution, detection of plagiarism, style analysis, sources of
misinformation, etc. The focus of this paper is to explore the limitations and
sensitiveness of established approaches to adversarial manipulations of inputs.
To this end, and using those established techniques, we first developed an
experimental frame-work for author detection and input perturbations. Next, we
experimentally evaluated the performance of the authorship detection model to a
collection of semantic-preserving adversarial perturbations of input
narratives. Finally, we compare and analyze the effects of different
perturbation strategies, input and model configurations, and the effects of
these on the author detection model.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors [24.954755569786396]
AI-text detection has emerged to distinguish between human and machine-generated content.
Recent research indicates that these detection systems often lack robustness and struggle to effectively differentiate perturbed texts.
Our work simulates real-world scenarios in both informal and professional writing, exploring the out-of-the-box performance of current detectors.
arXiv Detail & Related papers (2024-06-13T08:37:01Z) - Who Writes the Review, Human or AI? [0.36498648388765503]
This study proposes a methodology to accurately distinguish AI-generated and human-written book reviews.
Our approach utilizes transfer learning, enabling the model to identify generated text across different topics.
The experimental results demonstrate that it is feasible to detect the original source of text, achieving an accuracy rate of 96.86%.
arXiv Detail & Related papers (2024-05-30T17:38:44Z) - Leveraging the power of transformers for guilt detection in text [50.65526700061155]
This research explores the applicability of three transformer-based language models for detecting guilt in text.
Our proposed model outformed BERT and RoBERTa models by two and one points respectively.
arXiv Detail & Related papers (2024-01-15T01:40:39Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - An Information-Theoretic Approach for Detecting Edits in AI-Generated Text [7.013432243663526]
We propose a method to determine whether a given article was written entirely by a generative language model or perhaps contains edits by a different author, possibly a human.
We demonstrate the effectiveness of the method in detecting edits through extensive evaluations using real data.
Our analysis raises several interesting research questions at the intersection of information theory and data science.
arXiv Detail & Related papers (2023-08-24T12:49:21Z) - Explainable Contextual Anomaly Detection using Quantile Regression
Forests [14.80211278818555]
We develop connections between dependency-based traditional anomaly detection methods and contextual anomaly detection methods.
Based on resulting insights, we propose a novel approach to inherently interpretable contextual anomaly detection.
Our method outperforms state-of-the-art anomaly detection methods in terms of accuracy and interpretability.
arXiv Detail & Related papers (2023-02-22T09:39:59Z) - TraSE: Towards Tackling Authorial Style from a Cognitive Science
Perspective [4.123763595394021]
Authorship attribution experiments with over 27,000 authors and 1.4 million samples in a cross-domain scenario resulted in 90% attribution accuracy.
A qualitative analysis is performed on TraSE using physical human characteristics, like age, to validate its claim on capturing cognitive traits.
arXiv Detail & Related papers (2022-06-21T19:55:07Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.