JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding
over Small Language Models
- URL: http://arxiv.org/abs/2402.08761v1
- Date: Tue, 13 Feb 2024 19:54:29 GMT
- Title: JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding
over Small Language Models
- Authors: Jillian Fisher, Ximing Lu, Jaehun Jung, Liwei Jiang, Zaid Harchaoui,
Yejin Choi
- Abstract summary: We propose an unsupervised inference-time approach to authorship obfuscation.
We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation.
Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs.
- Score: 53.83273575102087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The permanence of online content combined with the enhanced authorship
identification techniques calls for stronger computational methods to protect
the identity and privacy of online authorship when needed, e.g., blind reviews
for scientific papers, anonymous online reviews, or anonymous interactions in
the mental health forums. In this paper, we propose an unsupervised
inference-time approach to authorship obfuscation to address the unique
challenges of authorship obfuscation: lack of supervision data for diverse
authorship and domains, and the need for a sufficient level of revision beyond
simple paraphrasing to obfuscate the authorship, all the while preserving the
original content and fluency.
We introduce JAMDEC, a user-controlled, inference-time algorithm for
authorship obfuscation that can be in principle applied to any text and
authorship. Our approach builds on small language models such as GPT2-XL in
order to help avoid disclosing the original content to proprietary LLM's APIs,
while also reducing the performance gap between small and large language models
via algorithmic enhancement. The key idea behind our approach is to boost the
creative power of smaller language models through constrained decoding, while
also allowing for user-specified controls and flexibility. Experimental results
demonstrate that our approach based on GPT2-XL outperforms previous
state-of-the-art methods based on comparably small models, while performing
competitively against GPT3.5 175B, a propriety model that is two orders of
magnitudes larger.
Related papers
- A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods [5.239989658197324]
Authorship obfuscation aims to disguise the identity of an author within a text.
This alteration needs to balance privacy and utility.
We propose TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization.
arXiv Detail & Related papers (2024-07-31T14:24:01Z) - AuthAttLyzer-V2: Unveiling Code Authorship Attribution using Enhanced Ensemble Learning Models & Generating Benchmark Dataset [0.0]
Source Code Authorship Attribution (SCAA) is crucial for software classification because it provides insights into the origin and behavior of software.
This paper presents AuthAttLyzer-V2, a new source code feature extractor for SCAA, focusing on lexical, semantic, syntactic, and N-gram features.
arXiv Detail & Related papers (2024-06-28T13:04:16Z) - Evaluating Copyright Takedown Methods for Language Models [100.38129820325497]
Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material.
This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs.
We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches.
arXiv Detail & Related papers (2024-06-26T18:09:46Z) - Keep It Private: Unsupervised Privatization of Online Text [13.381890596224867]
We introduce an automatic text privatization framework that fine-tunes a large language model via reinforcement learning to produce rewrites that balance soundness, sense, and privacy.
We evaluate it extensively on a large-scale test set of English Reddit posts by 68k authors composed of short-medium length texts.
arXiv Detail & Related papers (2024-05-16T17:12:18Z) - Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing
Security in Large Language Models [3.9490749767170636]
Large language models (LLMs) have revolutionized text generation, translation, and question-answering tasks.
Despite their widespread use, LLMs present challenges such as ethical dilemmas when models are compelled to respond inappropriately.
This paper addresses these challenges by introducing a multi-pronged approach that includes: 1) filtering sensitive vocabulary from user input to prevent unethical responses; 2) detecting role-playing to halt interactions that could lead to 'prison break' scenarios; and 4) extending these methodologies to various LLM derivatives like Multi-Model Large Language Models (MLLMs)
arXiv Detail & Related papers (2024-01-27T08:09:33Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks.
We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Improving Authorship Verification using Linguistic Divergence [6.673132899229721]
We propose an unsupervised solution to the Authorship Verification task that utilizes pre-trained deep language models.
The proposed metric is a measure of the difference between the two authors comparing against pre-trained language models.
arXiv Detail & Related papers (2021-03-12T03:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.