Related papers: Linearly Controlled Language Generation with Performative Guarantees

Linearly Controlled Language Generation with Performative Guarantees

URL: http://arxiv.org/abs/2405.15454v3
Date: Tue, 09 Sep 2025 07:03:01 GMT
Title: Linearly Controlled Language Generation with Performative Guarantees
Authors: Emily Cheng, Carmen Amo Alonso,
Abstract summary: We use a common model of concept semantics as linearly represented in an LM's latent space.<n>We take the view that natural language generation traces a trajectory in this continuous semantic space.<n>We propose a lightweight, gradient-free intervention that dynamically steers trajectories away from regions corresponding to undesired meanings.
Score: 4.447467536572626
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing prevalence of Large Language Models (LMs) in critical applications highlights the need for controlled language generation strategies that are not only computationally efficient but that also enjoy performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. In particular, we take the view that natural language generation traces a trajectory in this continuous semantic space, realized by the language model's hidden activations. This view permits a control-theoretic treatment of text generation in latent space, in which we propose a lightweight, gradient-free intervention that dynamically steers trajectories away from regions corresponding to undesired meanings. In particular, we propose to directly intervene the activations of the token that is being generated in embedding space in an online fashion. Crucially, we do not simply steer activations towards a desirable region. Instead, our method relies on classical techniques from control theory to precisely control activations in a context-dependent way, and guarantees that they are brought into a specific pre-defined region of embedding space that corresponds to allowed semantics. Our intervention is computed in closed-form according to an optimal controller formulation, minimally impacting generation time. This control of the activations in embedding space allows for fine-grained steering of attributes of the generated sequence. We demonstrate the effectiveness of our approach on different objectives -- toxicity avoidance and sentiment control -- while maintaining text quality.

Related papers

Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping [61.459927600301654]
Multi-condition control is bottlenecked by the conventional concatenate-and-attend'' strategy.<n>Our analysis reveals that much of this cross-modal interaction is spatially or semantically redundant.<n>We propose Position-aligned and Keyword-scoped Attention (PKA), a highly efficient framework designed to eliminate these redundancies.
arXiv Detail & Related papers (2026-02-06T16:39:10Z)
Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics [81.80010043113445]
Local weight fine-tuning, LoRA-based adaptation, and activation-based interventions are studied in isolation.<n>We present a unified view that frames these interventions as dynamic weight updates induced by a control signal.<n>Across methods, we observe a consistent trade-off between preference and utility: stronger control increases preference while predictably reducing utility.
arXiv Detail & Related papers (2026-02-02T17:04:36Z)
In-Distribution Steering: Balancing Control and Coherence in Language Model Generation [0.0815557531820863]
We introduce In-Distribution Steering (IDS), a novel method that adapts steering strength based on the input data distribution in representation space.<n>IDS achieves strong accuracy on classification tasks while producing coherent text without collapse, making IDS particularly well suited for real-world applications.
arXiv Detail & Related papers (2025-10-15T08:31:37Z)
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z)
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models [41.553639748766784]
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation.<n>This paper introduces a novel supervised steering approach that operates in sparse, interpretable representation spaces.
arXiv Detail & Related papers (2025-05-22T03:46:57Z)
Constrained Language Generation with Discrete Diffusion Models [61.81569616239755]
We present Constrained Discrete Diffusion (CDD), a novel method for enforcing constraints on natural language by integrating discrete diffusion models with differentiable optimization. We show how this technique can be applied to satisfy a variety of natural language constraints, including (i) toxicity mitigation by preventing harmful content from emerging, (ii) character and sequence level lexical constraints, and (iii) novel molecule sequence generation with specific property adherence.
arXiv Detail & Related papers (2025-03-12T19:48:12Z)
Controlled LLM Decoding via Discrete Auto-regressive Biasing [9.843359827321194]
Controlled text generation allows for enforcing user-defined constraints on large language model outputs.<n>We propose Discrete Auto-regressive Biasing, a controlled decoding algorithm that leverages gradients while operating entirely in the discrete text domain.<n>Our method significantly improves constraint satisfaction while maintaining comparable or better fluency, all with even lower computational costs.
arXiv Detail & Related papers (2025-02-06T00:14:43Z)
Risk-Aware Distributional Intervention Policies for Language Models [15.027122089807053]
Language models are prone to occasionally undesirable generations, such as harmful or toxic content. This paper presents a new two-stage approach to detect and mitigate undesirable content generations.
arXiv Detail & Related papers (2025-01-27T04:00:38Z)
Mitigating Semantic Leakage in Cross-lingual Embeddings via Orthogonality Constraint [6.880579537300643]
Current disentangled representation learning methods suffer from semantic leakage. We propose a novel training objective, ORthogonAlity Constraint LEarning (ORACLE) ORACLE builds upon two components: intra-class clustering and inter-class separation. We demonstrate that training with the ORACLE objective effectively reduces semantic leakage and enhances semantic alignment within the embedding space.
arXiv Detail & Related papers (2024-09-24T02:01:52Z)
Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks [10.880057430629126]
Disentangled latent spaces usually have better semantic separability and geometrical properties, which leads to better interpretability and more controllable data generation. In this work, we focus on a more general form of sentence disentanglement, targeting the localised modification and control of more general sentence semantic features. We introduce a flow-based invertible neural network (INN) mechanism integrated with a transformer-based language Autoencoder (AE) in order to deliver latent spaces with better separability properties.
arXiv Detail & Related papers (2023-05-02T18:27:13Z)
Language Model Detoxification in Dialogue with Contextualized Stance Control [18.30723730898435]
Previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context. We propose a novel control method to do context-dependent detoxification with the stance taken into consideration. Experimental results show that our proposed method can effectively learn the context-dependent stance control strategies while keeping a low self-toxicity of the underlying LM.
arXiv Detail & Related papers (2023-01-25T00:47:28Z)
Controllable Text Generation via Probability Density Estimation in the Latent Space [16.962510129437558]
We propose a novel control framework using probability density estimation in the latent space. Our method utilizes an invertible transformation function, the Normalizing Flow, that maps the complex distributions in the latent space to simple Gaussian distributions in the prior space. Experiments on single-attribute controls and multi-attribute control reveal that our method outperforms several strong baselines on attribute relevance and text quality.
arXiv Detail & Related papers (2022-12-16T07:11:18Z)
Language Detoxification with Attribute-Discriminative Latent Space [59.167432249229584]
Transformer-based Language Models (LMs) have achieved impressive results on natural language understanding tasks. They can also generate toxic text such as insults, threats, and profanity, limiting their real-world applications. We propose an effective yet efficient method for language detoxification using an attribute-discriminative latent space.
arXiv Detail & Related papers (2022-10-19T06:54:42Z)
Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation [48.32604585839687]
Previous adversarial approaches have shown promising results in inducing cross-lingual word embedding without parallel data. We propose to make use of a sequence of intermediate spaces for smooth bridging.
arXiv Detail & Related papers (2022-10-07T04:37:47Z)
COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics [69.8062252611486]
Cold decoding is a flexible framework that can be applied directly to off-the-shelf left-to-right language models. Our experiments on constrained generation tasks point to the effectiveness of our approach, both in terms of automatic and human evaluation.
arXiv Detail & Related papers (2022-02-23T18:59:27Z)
Region-Based Semantic Factorization in GANs [67.90498535507106]
We present a highly efficient algorithm to factorize the latent semantics learned by Generative Adversarial Networks (GANs) concerning an arbitrary image region. Through an appropriately defined generalized Rayleigh quotient, we solve such a problem without any annotations or training. Experimental results on various state-of-the-art GAN models demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-02-19T17:46:02Z)
GTAE: Graph-Transformer based Auto-Encoders for Linguistic-Constrained Text Style Transfer [119.70961704127157]
Non-parallel text style transfer has attracted increasing research interests in recent years. Current approaches still lack the ability to preserve the content and even logic of original sentences. We propose a method called Graph Transformer based Auto-GTAE, which models a sentence as a linguistic graph and performs feature extraction and style transfer at the graph level.
arXiv Detail & Related papers (2021-02-01T11:08:45Z)
APo-VAE: Text Generation in Hyperbolic Space [116.11974607497986]
In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations. An Adrial Poincare Variversaational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions. Experiments in language modeling and dialog-response generation tasks demonstrate the winning effectiveness of the proposed APo-VAE model.
arXiv Detail & Related papers (2020-04-30T19:05:41Z)
Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. VAEs tend to ignore latent variables with a strong auto-regressive decoder. We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
Discrete Variational Attention Models for Language Generation [51.88612022940496]
We propose a discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages. Thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse.
arXiv Detail & Related papers (2020-04-21T05:49:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.