Steered Generation via Gradient Descent on Sparse Features
- URL: http://arxiv.org/abs/2502.18644v1
- Date: Tue, 25 Feb 2025 21:06:14 GMT
- Title: Steered Generation via Gradient Descent on Sparse Features
- Authors: Sumanta Bhattacharyya, Pedram Rooshenas,
- Abstract summary: We modify the internal structure of large language models (LLMs) by training sparse autoencoders to learn a sparse representation of the query embedding.<n>We demonstrate that manipulating this sparse representation effectively transforms the output toward different stylistic and cognitive targets.
- Score: 1.534667887016089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) encode a diverse range of linguistic features within their latent representations, which can be harnessed to steer their output toward specific target characteristics. In this paper, we modify the internal structure of LLMs by training sparse autoencoders to learn a sparse representation of the query embedding, allowing precise control over the model's attention distribution. We demonstrate that manipulating this sparse representation effectively transforms the output toward different stylistic and cognitive targets. Specifically, in an educational setting, we show that the cognitive complexity of LLM-generated feedback can be systematically adjusted by modifying the encoded query representation at a specific layer. To achieve this, we guide the learned sparse embedding toward the representation of samples from the desired cognitive complexity level, using gradient-based optimization in the latent space.
Related papers
- LF-Steering: Latent Feature Activation Steering for Enhancing Semantic Consistency in Large Language Models [16.37602070339033]
Large Language Models (LLMs) often generate inconsistent responses when prompted with semantically equivalent paraphrased inputs.<n>We propose LF-Steering, a novel activation steering approach to precisely identify latent feature representations responsible for semantic inconsistency.<n>Our method maps the hidden states of the relevant transformer layer into a sparsely activated, high-dimensional feature space based on a sparse autoencoder.
arXiv Detail & Related papers (2025-01-19T13:06:51Z) - Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction [52.09472099976885]
IAR is an Improved AutoRegressive Visual Generation Method.<n>We propose a Codebook Rearrangement strategy that uses balanced k-means clustering algorithm.<n>We also propose a Cluster-oriented Cross-entropy Loss that guides the model to correctly predict the cluster where the token is located.
arXiv Detail & Related papers (2025-01-01T15:58:51Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Large Language Models Understand Layout [6.732578061359833]
Large language models (LLMs) demonstrate extraordinary abilities in a wide range of natural language processing (NLP) tasks.
We show that, beyond text understanding capability, LLMs are capable of processing text layouts denoted by spatial markers.
We show that layout understanding ability is beneficial for building efficient visual question-answering (VQA) systems.
arXiv Detail & Related papers (2024-07-08T09:03:12Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)<n>We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational
AutoEncoders [5.037881619912574]
In this paper, we investigate latent space separation methods for structural syntactic injection in Transformer-based VAEs.
Specifically, we explore how syntactic structures can be leveraged in the encoding stage through the integration of graph-based and sequential models.
Our empirical evaluation, carried out on natural language sentences and mathematical expressions, reveals that the proposed end-to-end VAE architecture can result in a better overall organisation of the latent space.
arXiv Detail & Related papers (2023-11-14T22:47:23Z) - An Integral Projection-based Semantic Autoencoder for Zero-Shot Learning [0.46644955105516456]
Zero-shot Learning (ZSL) classification categorizes or predicts classes (labels) that are not included in the training set (unseen classes)
Recent works proposed different semantic autoencoder (SAE) models where the encoder embeds a visual feature space into the semantic space and the decoder reconstructs the original visual feature space.
We propose an integral projection-based semantic autoencoder (IP-SAE) where an encoder projects a visual feature space vectord with the semantic space into a latent representation space.
arXiv Detail & Related papers (2023-06-26T12:06:20Z) - Deciphering the Projection Head: Representation Evaluation
Self-supervised Learning [6.375931203397043]
Self-supervised learning (SSL) aims to learn intrinsic features without labels.
Projection head always plays an important role in improving the performance of the downstream task.
We propose a Representation Evaluation Design (RED) in SSL models in which a shortcut connection between the representation and the projection vectors is built.
arXiv Detail & Related papers (2023-01-28T13:13:53Z) - Adaptive Discrete Communication Bottlenecks with Dynamic Vector
Quantization [76.68866368409216]
We propose learning to dynamically select discretization tightness conditioned on inputs.
We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks.
arXiv Detail & Related papers (2022-02-02T23:54:26Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z) - MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space.
We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.