Related papers: Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits

URL: http://arxiv.org/abs/2512.03465v1
Date: Wed, 03 Dec 2025 05:39:40 GMT
Title: Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits
Authors: Robert Dilworth,
Abstract summary: Attack script $textitTraceTarnish$ uses adversarial stylometry principles to anonymize the authorship of text-based messages.<n>Stylistometric cues--function-word frequencies, content-word distributions, and the Type-Token Ratio--serve as reliable indicators of compromise.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments--comments that were later alchemized into $\textit{TraceTarnish}$ data--to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features--features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues--function-word frequencies, content-word distributions, and the Type-Token Ratio--serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.

Related papers

StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching [0.0]
Stylometry supports copyright and plagiarism investigations, aids detection of harmful content, and provides historical context for literary works.<n>Stylometry is employed as a tool for authorship verification--confirming whether a text truly originates from a claimed author--it can also be weaponized for malicious purposes.<n>This paper explores how adversarial stylometry combined with steganography can counteract stylometric analysis.
arXiv Detail & Related papers (2026-01-14T00:49:20Z)
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison [48.89195616081196]
Feedback Descent is a framework that optimize text artifacts -- prompts, code, and molecules -- through structured textual feedback.<n>We show that in-context learning can transform structured feedback into gradient-like directional information, enabling targeted edits.<n>In the DOCKSTRING molecule discovery benchmark, Feedback Descent identifies novel drug-like molecules surpassing the $99.9$th percentile of a database with more than $260,000$ compounds across six protein targets.
arXiv Detail & Related papers (2025-11-11T07:14:13Z)
When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection [64.23509202768945]
We introduce dataset, the first benchmark for evaluating detector robustness in personalized settings.<n>Our experimental results demonstrate large performance gaps across detectors in personalized settings.<n>We propose method, a simple and reliable way to predict detector performance changes in personalized settings.
arXiv Detail & Related papers (2025-10-14T13:10:23Z)
Diversity Boosts AI-Generated Text Detection [51.56484100374058]
DivEye is a novel framework that captures how unpredictability fluctuates across a text using surprisal-based features.<n>Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines.
arXiv Detail & Related papers (2025-09-23T10:21:22Z)
Towards Bridging Review Sparsity in Recommendation with Textual Edge Graph Representation [28.893058826607735]
We propose a unified framework that imputes missing reviews by jointly modeling semantic and structural signals.<n>Experiments on the Amazon and Goodreads datasets show that TWISTER consistently outperforms traditional numeric, graph-based, and LLM baselines.<n>In summary, TWISTER generates reviews that are more helpful, authentic, and specific, while smoothing structural signals for improved recommendations.
arXiv Detail & Related papers (2025-08-02T00:53:40Z)
Towards Generalized and Training-Free Text-Guided Semantic Manipulation [123.80467566483038]
Text-guided semantic manipulation refers to semantically editing an image generated from a source prompt to match a target prompt.<n>We propose a novel $textitGTF$ for text-guided semantic manipulation, which has the following attractive capabilities.<n>Our experiments demonstrate the efficacy of our approach, highlighting its potential to advance the state-of-the-art in semantics manipulation.
arXiv Detail & Related papers (2025-04-24T05:54:56Z)
Breaking BERT: Gradient Attack on Twitter Sentiment Analysis for Targeted Misclassification [0.0]
Bidirectional Representations from Transformers BERT has been widely adapted in sentiment analysis.<n>BERT is susceptible to adversarial attacks.<n>This paper aims to scrutinize the inherent vulnerabilities of such models in Twitter sentiment analysis.
arXiv Detail & Related papers (2025-04-02T04:21:19Z)
Discourse Features Enhance Detection of Document-Level Machine-Generated Content [53.41994768824785]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>Existing MGC detectors often focus solely on surface-level information, overlooking implicit and structural features.<n>We introduce novel methodologies and datasets to overcome these challenges.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Integrating Bidirectional Long Short-Term Memory with Subword Embedding for Authorship Attribution [2.3429306644730854]
Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution. The proposed method was experimentally evaluated against numerous state-of-the-art methods across the public corporal of CCAT50, IMDb62, Blog50, and Twitter50.
arXiv Detail & Related papers (2023-06-26T11:35:47Z)
Scaling up sign spotting through sign language dictionaries [99.50956498009094]
The focus of this work is $textitsign spotting$ - given a video of an isolated sign, our task is to identify $textitwhether$ and $textitwhere$ it has been signed in a continuous, co-articulated sign language video. We train a model using multiple types of available supervision by: (1) $textitwatching$ existing footage which is sparsely labelled using mouthing cues; (2) $textitreading$ associated subtitles which provide additional translations of the signed content. We validate the effectiveness of our approach on low
arXiv Detail & Related papers (2022-05-09T10:00:03Z)
Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data. In this paper, we propose variable-length textual adversarial attacks(VL-Attack) Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.