Extending Activation Steering to Broad Skills and Multiple Behaviours
- URL: http://arxiv.org/abs/2403.05767v1
- Date: Sat, 9 Mar 2024 02:30:04 GMT
- Title: Extending Activation Steering to Broad Skills and Multiple Behaviours
- Authors: Teun van der Weij, Massimo Poesio, Nandi Schoots
- Abstract summary: We investigate the efficacy of activation steering for broad skills and multiple behaviours.
We find that steering broader skills is competitive to steering narrower skills.
We steer models to become more or less myopic and wealth-seeking.
- Score: 5.40770929004319
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current large language models have dangerous capabilities, which are likely
to become more problematic in the future. Activation steering techniques can be
used to reduce risks from these capabilities. In this paper, we investigate the
efficacy of activation steering for broad skills and multiple behaviours.
First, by comparing the effects of reducing performance on general coding
ability and Python-specific ability, we find that steering broader skills is
competitive to steering narrower skills. Second, we steer models to become more
or less myopic and wealth-seeking, among other behaviours. In our experiments,
combining steering vectors for multiple different behaviours into one steering
vector is largely unsuccessful. On the other hand, injecting individual
steering vectors at different places in a model simultaneously is promising.
Related papers
- Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference [43.474068248379815]
We propose a shared encoder trained on multiple computer vision tasks critical for urban navigation.
We introduce a multi-scale feature network for pose estimation to improve depth learning.
Our findings demonstrate that a shared backbone trained on diverse visual tasks is capable of providing overall perception capabilities.
arXiv Detail & Related papers (2024-09-16T08:54:03Z) - Analyzing the Generalization and Reliability of Steering Vectors [8.253773195379166]
We show that steering vectors have substantial limitations both in- and out-of-distribution.
In-distribution, steerability is highly variable across different inputs.
Out-of-distribution, while steering vectors often generalise well, for several concepts they are brittle to reasonable changes in the prompt.
arXiv Detail & Related papers (2024-07-17T08:32:03Z) - Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization [34.05163996072159]
"steering vectors" are extracted from the activations of human preference data.
This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization.
Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs.
arXiv Detail & Related papers (2024-05-28T05:10:40Z) - Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control [106.32794844077534]
This paper presents a study on using deep reinforcement learning to create dynamic locomotion controllers for bipedal robots.
We develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing.
This work pushes the limits of agility for bipedal robots through extensive real-world experiments.
arXiv Detail & Related papers (2024-01-30T10:48:43Z) - Improving Activation Steering in Language Models with Mean-Centring [10.101141087916133]
We find that taking the average of activations associated with a target dataset, and subtracting the mean of all training activations, results in effective steering vectors.
We also apply mean-centring to extract function vectors, more effectively triggering the execution of a range of natural language tasks by a significant margin.
arXiv Detail & Related papers (2023-12-06T18:27:07Z) - Learning and Adapting Agile Locomotion Skills by Transferring Experience [71.8926510772552]
We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks.
We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments.
arXiv Detail & Related papers (2023-04-19T17:37:54Z) - Learning energy-efficient driving behaviors by imitating experts [75.12960180185105]
This paper examines the role of imitation learning in bridging the gap between control strategies and realistic limitations in communication and sensing.
We show that imitation learning can succeed in deriving policies that, if adopted by 5% of vehicles, may boost the energy-efficiency of networks with varying traffic conditions by 15% using only local observations.
arXiv Detail & Related papers (2022-06-28T17:08:31Z) - ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically
Simulated Characters [123.88692739360457]
General-purpose motor skills enable humans to perform complex tasks.
These skills also provide powerful priors for guiding their behaviors when learning new tasks.
We present a framework for learning versatile and reusable skill embeddings for physically simulated characters.
arXiv Detail & Related papers (2022-05-04T06:13:28Z) - Advanced Skills through Multiple Adversarial Motion Priors in
Reinforcement Learning [10.445369597014533]
We present an approach to augment the concept of adversarial motion prior-based reinforcement learning.
We show that multiple styles and skills can be learned simultaneously without notable performance differences.
Our approach is validated in several real-world experiments with a wheeled-legged quadruped robot.
arXiv Detail & Related papers (2022-03-23T09:24:06Z) - Reinforcement Learning for Robust Parameterized Locomotion Control of
Bipedal Robots [121.42930679076574]
We present a model-free reinforcement learning framework for training robust locomotion policies in simulation.
domain randomization is used to encourage the policies to learn behaviors that are robust across variations in system dynamics.
We demonstrate this on versatile walking behaviors such as tracking a target walking velocity, walking height, and turning yaw.
arXiv Detail & Related papers (2021-03-26T07:14:01Z) - Learning Agile Robotic Locomotion Skills by Imitating Animals [72.36395376558984]
Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics.
We present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals.
arXiv Detail & Related papers (2020-04-02T02:56:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.