Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
- URL: http://arxiv.org/abs/2602.07120v1
- Date: Fri, 06 Feb 2026 19:00:14 GMT
- Title: Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
- Authors: Jacqueline He, Jonathan Hayase, Wen-tau Yih, Sewoong Oh, Luke Zettlemoyer, Pang Wei Koh,
- Abstract summary: Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans.<n>We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying.<n>We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility.
- Score: 99.16364381244445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Anchored Decoding adaptively allocates a user-chosen information budget over the generation trajectory and enforces per-step constraints that yield a sequence-level guarantee, enabling a tunable risk-utility trade-off. To make Anchored Decoding practically useful, we introduce a new permissively trained safe model (TinyComma 1.8B), as well as Anchored$_{\mathrm{Byte}}$ Decoding, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler framework (Hayase et al., 2025). We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Anchored and Anchored$_{\mathrm{Byte}}$ Decoding define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.
Related papers
- SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models [67.84174763413178]
We introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection.<n>We show that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks.
arXiv Detail & Related papers (2026-01-13T15:01:38Z) - From Bits to Rounds: Parallel Decoding with Exploration for Diffusion Language Models [19.97248408121574]
Diffusion Language Models (DLMs) offer comparable accuracy with faster inference speed via parallel decoding.<n>High-confidence tokens carry negligible information and strictly relying on them limits the effective progress made in each decoding round.<n>We propose Explore-Then-Exploit (ETE), a training-free decoding strategy that maximizes information throughput and decoding efficiency.
arXiv Detail & Related papers (2025-11-26T06:38:37Z) - Certified Mitigation of Worst-Case LLM Copyright Infringement [46.571805194176825]
"copyright takedown" methods are aimed at preventing models from generating content substantially similar to copyrighted ones.<n>We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown.<n>Our results suggest that lightweight, inference-time methods can be surprisingly effective for copyright prevention.
arXiv Detail & Related papers (2025-04-22T17:16:53Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - Copyright-Protected Language Generation via Adaptive Model Fusion [15.48692649098646]
Copyright-Protecting Model Fusion (CP-Fuse) is a novel approach that combines models trained on disjoint sets of copyrighted material during inference.<n>We show that CP-Fuse significantly reduces the reproduction of protected material without compromising the quality of text and code generation.
arXiv Detail & Related papers (2024-12-09T16:13:17Z) - Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level [10.476222570886483]
Large language models (LLMs) have demonstrated immense utility across various industries.<n>As LLMs advance, the risk of harmful outputs increases due to incorrect or malicious instruction prompts.<n>This paper examines the LLMs' capability to recognize harmful outputs, revealing and quantifying their proficiency in assessing the danger of previous tokens.
arXiv Detail & Related papers (2024-10-09T12:09:30Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - Strong Copyright Protection for Language Models via Adaptive Model Fusion [15.48692649098646]
Copyright-Protecting Fusion (CP-Fuse) is an algorithm that adaptively combines language models to minimize the reproduction of protected materials.
Our results show that CP-Fuse significantly reduces the memorization of copyrighted content while maintaining high-quality text and code generation.
arXiv Detail & Related papers (2024-07-29T15:32:30Z) - CPR: Retrieval Augmented Generation for Copyright Protection [101.15323302062562]
We introduce CopyProtected generation with Retrieval (CPR), a new method for RAG with strong copyright protection guarantees.
CPR allows to condition the output of diffusion models on a set of retrieved images.
We prove that CPR satisfies Near Access Freeness (NAF) which bounds the amount of information an attacker may be able to extract from the generated images.
arXiv Detail & Related papers (2024-03-27T18:09:55Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.