Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
- URL: http://arxiv.org/abs/2508.15840v3
- Date: Thu, 30 Oct 2025 16:25:05 GMT
- Title: Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
- Authors: Robert Dilworth,
- Abstract summary: The content of a message -- necessarily open for public consumption -- exposes an attack vector: stylometric analysis.<n>In this paper, we dissect the technique of stylometry, discuss an antithetical counter-strategy in adversarial stylometry, and devise enhancements through Unicode steganography.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When using a public communication channel -- whether formal or informal, such as commenting or posting on social media -- end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user takes utmost precautions to anonymize their online presence -- using an alias or pseudonym; masking their IP address; spoofing their geolocation; concealing their operating system and user agent; deploying encryption; registering with a disposable phone number or email; disabling non-essential settings; revoking permissions; and blocking cookies and fingerprinting -- one obvious element still lingers: the message itself. Assuming they avoid lapses in judgment or accidental self-exposure, there should be little evidence to validate their actual identity, right? Wrong. The content of their message -- necessarily open for public consumption -- exposes an attack vector: stylometric analysis, or author profiling. In this paper, we dissect the technique of stylometry, discuss an antithetical counter-strategy in adversarial stylometry, and devise enhancements through Unicode steganography.
Related papers
- Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs [61.15237978606501]
Large language models can infer private user attributes from user-generated text.<n>Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements.<n>We propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS)
arXiv Detail & Related papers (2026-02-12T03:37:50Z) - Federated Anonymous Blocklisting across Service Providers and its Application to Group Messaging [1.7616042687330637]
In Anonymous Blocklisting schemes, users must prove during authentication that none of their previous pseudonyms has been blocked.<n>We propose an alternative textitFederated Anonymous Blocklisting (FAB) in which the centralised Service Provider is replaced by small distributed Realms.
arXiv Detail & Related papers (2025-11-05T14:11:46Z) - User Perceptions and Attitudes Toward Untraceability in Messaging Platforms [4.724825031148412]
We study user perceptions of "untraceability", i.e., preventing third parties from tracing who communicates with whom.<n>We find that users associate the concept of untraceability with a wide range of privacy enhancing technologies.<n>We discuss the gap between users' perceptions of untraceability and the threat models addressed by untraceable communication protocols.
arXiv Detail & Related papers (2025-06-12T18:19:50Z) - Privacy-Preserving Biometric Verification with Handwritten Random Digit String [49.77172854374479]
Handwriting verification has stood as a steadfast identity authentication method for decades.<n>However, this technique risks potential privacy breaches due to the inclusion of personal information in handwritten biometrics such as signatures.<n>We propose using the Random Digit String (RDS) for privacy-preserving handwriting verification.
arXiv Detail & Related papers (2025-03-17T03:47:25Z) - Personalized Language Model Learning on Text Data Without User Identifiers [79.36212347601223]
We propose to let each mobile device maintain a user-specific distribution to dynamically generate user embeddings.<n>To prevent the cloud from tracking users via uploaded embeddings, the local distributions of different users should either be derived from a linearly dependent space.<n> Evaluation on both public and industrial datasets reveals a remarkable improvement in accuracy from incorporating anonymous user embeddings.
arXiv Detail & Related papers (2025-01-10T15:46:19Z) - Fingerprinting and Tracing Shadows: The Development and Impact of Browser Fingerprinting on Digital Privacy [55.2480439325792]
Browser fingerprinting is a growing technique for identifying and tracking users online without traditional methods like cookies.
This paper gives an overview by examining the various fingerprinting techniques and analyzes the entropy and uniqueness of the collected data.
arXiv Detail & Related papers (2024-11-18T20:32:31Z) - Careless Whisper: Exploiting Silent Delivery Receipts to Monitor Users on Mobile Instant Messengers [1.5496023883771977]
This paper highlights that delivery receipts can pose significant privacy risks to users.<n>We use specifically crafted messages that trigger delivery receipts allowing any user to be pinged without their knowledge or consent.<n>We argue for a design change to address this issue.
arXiv Detail & Related papers (2024-11-17T22:58:28Z) - Pudding: Private User Discovery in Anonymity Networks [9.474649136535705]
Pudding is a novel private user discovery protocol.
It hides contact relationships between users, prevents impersonation, and conceals which usernames are registered on the network.
Pudding can be deployed on Loopix and Nym without changes to the underlying anonymity network protocol.
arXiv Detail & Related papers (2023-11-17T19:06:08Z) - Defending Against Authorship Identification Attacks [9.148691357200216]
Authorship identification has proven unsettlingly effective in inferring the identity of the author of an unsigned document.
The presented work offers a comprehensive review of the advancements in this research area spanning over the past two decades and beyond.
arXiv Detail & Related papers (2023-10-02T19:03:11Z) - Unsupervised Text Deidentification [101.2219634341714]
We propose an unsupervised deidentification method that masks words that leak personally-identifying information.
Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
arXiv Detail & Related papers (2022-10-20T18:54:39Z) - Protecting Anonymous Speech: A Generative Adversarial Network
Methodology for Removing Stylistic Indicators in Text [2.9005223064604078]
We develop a new approach to authorship anonymization by constructing a generative adversarial network.
Our fully automatic method achieves comparable results to other methods in terms of content preservation and fluency.
Our approach is able to generalize well to an open-set context and anonymize sentences from authors it has not encountered before.
arXiv Detail & Related papers (2021-10-18T17:45:56Z) - Backdoor Attack against Speaker Verification [86.43395230456339]
We show that it is possible to inject the hidden backdoor for infecting speaker verification models by poisoning the training data.
We also demonstrate that existing backdoor attacks cannot be directly adopted in attacking speaker verification.
arXiv Detail & Related papers (2020-10-22T11:10:08Z) - Mind the GAP: Security & Privacy Risks of Contact Tracing Apps [75.7995398006171]
Google and Apple have jointly provided an API for exposure notification in order to implement decentralized contract tracing apps using Bluetooth Low Energy.
We demonstrate that in real-world scenarios the GAP design is vulnerable to (i) profiling and possibly de-anonymizing persons, and (ii) relay-based wormhole attacks that basically can generate fake contacts.
arXiv Detail & Related papers (2020-06-10T16:05:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.