Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
- URL: http://arxiv.org/abs/2406.02721v2
- Date: Tue, 18 Jun 2024 15:58:38 GMT
- Title: Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
- Authors: Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu,
- Abstract summary: Self-Control is a novel method to control the behavior of large language models (LLMs) without explicit human annotations.
Self-Control computes a gradient of a model's self-judgment concerning the model's hidden states.
Our experiments demonstrate Self-Control's efficacy across multiple domains, including emotional modulation, ensuring harmlessness, and enhancing complex reasoning.
- Score: 47.39322226776273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation process towards desired behaviors. To enhance efficiency, we introduce Self-Control_{prefix}, a compact module that encapsulates the learned representations from suffix gradients into a Prefix Controller, facilitating inference-time control for various LLM behaviors. Our experiments demonstrate Self-Control's efficacy across multiple domains, including emotional modulation, ensuring harmlessness, and enhancing complex reasoning. Especially, Self-Control_{prefix} enables a plug-and-play control and jointly controls multiple attributes, improving model outputs without altering model parameters or increasing inference-time costs.
Related papers
- GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs [56.93583799109029]
GrAInS is an inference-time steering approach that operates across both language-only and vision-language models and tasks.<n>During inference, GrAInS hidden activations at transformer layers guided by token-level attribution signals, and normalizes activations to preserve representational scale.<n>It consistently outperforms both fine-tuning and existing steering baselines.
arXiv Detail & Related papers (2025-07-24T02:34:13Z) - Can Large Reasoning Models Self-Train? [58.953117118687096]
Scaling the performance of large language models increasingly depends on methods that reduce reliance on human supervision.<n>We propose an online self-training reinforcement learning algorithm that leverages the model's self-consistency to infer correctness signals and train without any ground-truth supervision.
arXiv Detail & Related papers (2025-05-27T17:16:00Z) - Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs)
We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data.
We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z) - Adding Conditional Control to Diffusion Models with Reinforcement Learning [59.295203871547336]
Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples.
This work presents a novel method based on reinforcement learning (RL) to add additional controls, leveraging an offline dataset.
arXiv Detail & Related papers (2024-06-17T22:00:26Z) - ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback [20.910939141948123]
ControlNet++ is a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls.
It achieves improvements over ControlNet by 11.1% mIoU, 13.4% SSIM, and 7.6% RMSE, respectively, for segmentation mask, line-art edge, and depth conditions.
arXiv Detail & Related papers (2024-04-11T17:59:09Z) - PID Control-Based Self-Healing to Improve the Robustness of Large Language Models [23.418411870842178]
Minor perturbations can significantly reduce the performance of well-trained language models.
We construct a computationally efficient self-healing process to correct undesired model behavior.
The proposed PID control-based self-healing is a low cost framework that improves the robustness of pre-trained large language models.
arXiv Detail & Related papers (2024-03-31T23:46:51Z) - Fine-grained Controllable Video Generation via Object Appearance and
Context [74.23066823064575]
We propose fine-grained controllable video generation (FACTOR) to achieve detailed control.
FACTOR aims to control objects' appearances and context, including their location and category.
Our method achieves controllability of object appearances without finetuning, which reduces the per-subject optimization efforts for the users.
arXiv Detail & Related papers (2023-12-05T17:47:33Z) - Fine-Tuning Language Models Using Formal Methods Feedback [53.24085794087253]
We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems.
The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions.
The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%.
arXiv Detail & Related papers (2023-10-27T16:24:24Z) - Focused Prefix Tuning for Controllable Text Generation [19.88484696133778]
We propose focused prefix tuning(FPT) to mitigate the problem and to enable the control to focus on the desired attribute.
Experimental results show that FPT can achieve better control accuracy and text fluency than baseline models in single-attribute control tasks.
arXiv Detail & Related papers (2023-06-01T06:00:43Z) - Learning Self-Regularized Adversarial Views for Self-Supervised Vision
Transformers [105.89564687747134]
We propose a self-regularized AutoAugment method to learn views for self-supervised vision transformers.
First, we reduce the search cost of AutoView to nearly zero by learning views and network parameters simultaneously.
We also present a curated augmentation policy search space for self-supervised learning.
arXiv Detail & Related papers (2022-10-16T06:20:44Z) - Steady-State Error Compensation in Reference Tracking and Disturbance
Rejection Problems for Reinforcement Learning-Based Control [0.9023847175654602]
Reinforcement learning (RL) is a promising, upcoming topic in automatic control applications.
Initiative action state augmentation (IASA) for actor-critic-based RL controllers is introduced.
This augmentation does not require any expert knowledge, leaving the approach model free.
arXiv Detail & Related papers (2022-01-31T16:29:19Z) - ControlVAE: Tuning, Analytical Properties, and Performance Analysis [14.272917020105147]
ControlVAE is a new variational autoencoder framework.
It stabilizes the KL-divergence of VAE models to a specified value.
It can achieve a good trade-off between reconstruction quality and KL-divergence.
arXiv Detail & Related papers (2020-10-31T12:32:39Z) - Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning.
We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.