On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification
- URL: http://arxiv.org/abs/2601.17280v1
- Date: Sat, 24 Jan 2026 03:39:38 GMT
- Title: On the Insecurity of Keystroke-Based AI Authorship Detection: Timing-Forgery Attacks Against Motor-Signal Verification
- Authors: David Condrey,
- Abstract summary: Recent proposals advocate using keystroke timing signals to distinguish human-composed text from AI-generated content.<n>We show that this class of defenses is insecure against two practical attack classes.<n>We formalize a non-identifiability result: when the detector observes only timing, the mutual information between features and content provenance is zero for copy-type attacks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent proposals advocate using keystroke timing signals, specifically the coefficient of variation ($δ$) of inter-keystroke intervals, to distinguish human-composed text from AI-generated content. We demonstrate that this class of defenses is insecure against two practical attack classes: the copy-type attack, in which a human transcribes LLM-generated text producing authentic motor signals, and timing-forgery attacks, in which automated agents sample inter-keystroke intervals from empirical human distributions. Using 13,000 sessions from the SBU corpus and three timing-forgery variants (histogram sampling, statistical impersonation, and generative LSTM), we show all attacks achieve $\ge$99.8% evasion rates against five classifiers. While detectors achieve AUC=1.000 against fully-automated injection, they classify $\ge$99.8% of attack samples as human with mean confidence $\ge$0.993. We formalize a non-identifiability result: when the detector observes only timing, the mutual information between features and content provenance is zero for copy-type attacks. Although composition and transcription produce statistically distinguishable motor patterns (Cohen's d=1.28), both yield $δ$ values 2-4x above detection thresholds, rendering the distinction security-irrelevant. These systems confirm a human operated the keyboard, but not whether that human originated the text. Securing provenance requires architectures that bind the writing process to semantic content.
Related papers
- PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks [2.540711742769252]
We investigate why iteratively-paraphrased text evades detection systems designed for AIGT identification.<n>We introduce PADBen, the first benchmark systematically evaluating detector robustness against paraphrase attack scenarios.
arXiv Detail & Related papers (2025-11-01T05:59:46Z) - Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors [77.82885394684202]
We propose textbfContrastive textbfParaphrase textbfAttack (CoPA), a training-free method that effectively deceives text detectors.<n>CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by large language models.<n>Our theoretical analysis suggests the superiority of the proposed attack.
arXiv Detail & Related papers (2025-05-21T10:08:39Z) - Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly.<n>Current backdoor samples often exhibit two key abnormalities compared to benign samples.<n>We propose Trigger without Trace (TwT) by explicitly mitigating these consistencies.
arXiv Detail & Related papers (2025-03-22T10:41:46Z) - T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - OrderBkd: Textual backdoor attack through repositioning [0.0]
Third-party datasets and pre-trained machine learning models pose a threat to NLP systems.
Existing backdoor attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing.
Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger.
arXiv Detail & Related papers (2024-02-12T14:53:37Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.<n>The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.<n>We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Generative Adversarial Network-Driven Detection of Adversarial Tasks in
Mobile Crowdsensing [5.675436513661266]
Crowdsensing systems are vulnerable to various attacks as they build on non-dedicated and ubiquitous properties.
Previous works suggest that GAN-based attacks exhibit more crucial devastation than empirically designed attack samples.
This paper aims to detect intelligently designed illegitimate sensing service requests by integrating a GAN-based model.
arXiv Detail & Related papers (2022-02-16T00:23:25Z) - Sample Efficient Detection and Classification of Adversarial Attacks via
Self-Supervised Embeddings [40.332149464256496]
Adrial robustness of deep models is pivotal in ensuring safe deployment in real world settings.
We propose a self-supervised method to detect adversarial attacks and classify them to their respective threat models.
We use a SimCLR encoder in our experiments, since we show the SimCLR embedding distance is a good proxy for human perceptibility.
arXiv Detail & Related papers (2021-08-30T16:39:52Z) - Hidden Backdoors in Human-Centric Language Models [12.694861859949585]
We create covert and natural triggers for textual backdoor attacks.
We deploy our hidden backdoors through two state-of-the-art trigger embedding methods.
We demonstrate that the proposed hidden backdoors can be effective across three downstream security-critical NLP tasks.
arXiv Detail & Related papers (2021-05-01T04:41:00Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.