Controlling Large Language Models Through Concept Activation Vectors
- URL: http://arxiv.org/abs/2501.05764v1
- Date: Fri, 10 Jan 2025 07:41:48 GMT
- Title: Controlling Large Language Models Through Concept Activation Vectors
- Authors: Hanyu Zhang, Xiting Wang, Chengao Li, Xiang Ao, Qing He,
- Abstract summary: We propose Generation with Concept Activation Vector (GCAV), a lightweight model control framework.
GCAV ensures accurate control without requiring resource-extensive fine-tuning.
Our framework achieves state-of-the-art performance with granular control, allowing for fine-grained adjustments of both the steering layers and the steering magnitudes for individual samples.
- Score: 30.348768212571255
- License:
- Abstract: As large language models (LLMs) are widely deployed across various domains, the ability to control their generated outputs has become more critical. This control involves aligning LLMs outputs with human values and ethical principles or customizing LLMs on specific topics or styles for individual users. Existing controlled generation methods either require significant computational resources and extensive trial-and-error or provide coarse-grained control. In this paper, we propose Generation with Concept Activation Vector (GCAV), a lightweight model control framework that ensures accurate control without requiring resource-extensive fine-tuning. Specifically, GCAV first trains a concept activation vector for specified concepts to be controlled, such as toxicity. During inference, GCAV steers the concept vector in LLMs, for example, by removing the toxicity concept vector from the activation layers. Control experiments from different perspectives, including toxicity reduction, sentiment control, linguistic style, and topic control, demonstrate that our framework achieves state-of-the-art performance with granular control, allowing for fine-grained adjustments of both the steering layers and the steering magnitudes for individual samples.
Related papers
- DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models [55.42794740244581]
We introduce DC (Decouple)-ControlNet, a framework for multi-condition image generation.
The core idea behind DC-ControlNet is to decouple control conditions, transforming global control into a hierarchical system.
For interactions between elements, we introduce the Inter-Element Controller, which accurately handles multi-element interactions.
arXiv Detail & Related papers (2025-02-20T18:01:02Z) - Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering [0.0]
This paper explores activation engineering, where outputs of pre-trained LLMs are controlled by manipulating their activations at inference time.
We introduce conceptors - mathematical constructs that represent sets of activation vectors as ellipsoidal regions.
Our experiments demonstrate that conceptors outperform traditional methods across multiple steering tasks.
arXiv Detail & Related papers (2024-10-09T10:09:37Z) - ControlVAR: Exploring Controllable Visual Autoregressive Modeling [48.66209303617063]
Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs)
Challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs.
This paper introduces Controlmore, a novel framework that explores pixel-level controls in visual autoregressive modeling for flexible and efficient conditional generation.
arXiv Detail & Related papers (2024-06-14T06:35:33Z) - Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization [34.05163996072159]
"steering vectors" are extracted from the activations of human preference data.
This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization.
Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs.
arXiv Detail & Related papers (2024-05-28T05:10:40Z) - Decision Transformer as a Foundation Model for Partially Observable Continuous Control [5.453548045211778]
Decision Transformer (DT) architecture is used to predict optimal action based on past observations, actions, and rewards.
DT exhibits remarkable zero-shot generalization abilities for completely new tasks.
These findings highlight the potential of DT as a foundational controller for general control applications.
arXiv Detail & Related papers (2024-04-03T02:17:34Z) - LiFi: Lightweight Controlled Text Generation with Fine-Grained Control
Codes [46.74968005604948]
We present LIFI, which offers a lightweight approach with fine-grained control for controlled text generation.
We evaluate LIFI on two conventional tasks -- sentiment control and topic control -- and one newly proposed task -- stylistic novel writing.
arXiv Detail & Related papers (2024-02-10T11:53:48Z) - Fine-grained Controllable Video Generation via Object Appearance and
Context [74.23066823064575]
We propose fine-grained controllable video generation (FACTOR) to achieve detailed control.
FACTOR aims to control objects' appearances and context, including their location and category.
Our method achieves controllability of object appearances without finetuning, which reduces the per-subject optimization efforts for the users.
arXiv Detail & Related papers (2023-12-05T17:47:33Z) - ControlVAE: Model-Based Learning of Generative Controllers for
Physics-Based Characters [28.446959320429656]
We introduce ControlVAE, a model-based framework for learning generative motion control policies based on variational autoencoders (VAE)
Our framework can learn a rich and flexible latent representation of skills and a skill-conditioned generative control policy from a diverse set of unorganized motion sequences.
We demonstrate the effectiveness of ControlVAE using a diverse set of tasks, which allows realistic and interactive control of the simulated characters.
arXiv Detail & Related papers (2022-10-12T10:11:36Z) - Sparsity in Partially Controllable Linear Systems [56.142264865866636]
We study partially controllable linear dynamical systems specified by an underlying sparsity pattern.
Our results characterize those state variables which are irrelevant for optimal control.
arXiv Detail & Related papers (2021-10-12T16:41:47Z) - Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem.
We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z) - Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill
Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner.
Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion.
We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.