Controlling Large Language Model Agents with Entropic Activation Steering
- URL: http://arxiv.org/abs/2406.00244v2
- Date: Thu, 10 Oct 2024 20:47:12 GMT
- Title: Controlling Large Language Model Agents with Entropic Activation Steering
- Authors: Nate Rahn, Pierluca D'Oro, Marc G. Bellemare,
- Abstract summary: We introduce Entropic Activation Steering (EAST), an activation steering method for in-context learning agents.
We show that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM.
We also reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions.
- Score: 20.56909601159833
- License:
- Abstract: The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspective, and introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. Firstly, we demonstrate that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling. Secondly, we reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions. Finally, we demonstrate that the steering vectors obtained by EAST generalize across task variants. In total, these results show that LLM agents explicitly encode uncertainty over their actions in their representation space. Our work paves the way for a new understanding of the functioning of LLM agents and to effective control of their decision-making behaviors.
Related papers
- Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.
Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.
We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - Identifying and Manipulating Personality Traits in LLMs Through Activation Engineering [0.0]
This study builds on the novel approach of "activation engineering"
We leverage activation engineering to develop a method for identifying and adjusting activation directions related to personality traits.
arXiv Detail & Related papers (2024-12-10T23:15:25Z) - CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models [37.476241509187304]
Large Language Models (LLMs) achieve remarkable performance through pretraining on extensive data.
The lack of interpretability in their underlying mechanisms limits the ability to effectively steer LLMs for specific applications.
In this work, we investigate the mechanisms of LLMs from a cognitive perspective using eye movement measures.
arXiv Detail & Related papers (2024-10-23T09:40:15Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Empowering Large Language Model Agents through Action Learning [85.39581419680755]
Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error.
We argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents.
We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions.
arXiv Detail & Related papers (2024-02-24T13:13:04Z) - Formally Specifying the High-Level Behavior of LLM-Based Agents [24.645319505305316]
LLMs have emerged as promising tools for solving challenging problems without the need for task-specific finetuned models.
Currently, the design and implementation of such agents is ad hoc, as the wide variety of tasks that LLM-based agents may be applied to naturally means there can be no one-size-fits-all approach to agent design.
We propose a minimalistic generation framework that simplifies the process of building agents.
arXiv Detail & Related papers (2023-10-12T17:24:15Z) - ExpeL: LLM Agents Are Experiential Learners [57.13685954854463]
We introduce the Experiential Learning (ExpeL) agent to allow learning from agent experiences without requiring parametric updates.
Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks.
At inference, the agent recalls its extracted insights and past experiences to make informed decisions.
arXiv Detail & Related papers (2023-08-20T03:03:34Z) - AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks.
We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z) - Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making
using Language Guided World Modelling [101.59430768507997]
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
We propose using few-shot large language models (LLMs) to hypothesize an Abstract World Model (AWM)
Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude.
arXiv Detail & Related papers (2023-01-28T02:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.