Behavioral Exploration: Learning to Explore via In-Context Adaptation
- URL: http://arxiv.org/abs/2507.09041v1
- Date: Fri, 11 Jul 2025 21:36:19 GMT
- Title: Behavioral Exploration: Learning to Explore via In-Context Adaptation
- Authors: Andrew Wagenmaker, Zhiyuan Zhou, Sergey Levine,
- Abstract summary: We train a long-context generative model to predict expert actions conditioned on a context of past observations and a measure of how exploratory'' the expert's behaviors are relative to this context.<n>This enables the model to not only mimic the behavior of an expert, but also, by feeding its past history of interactions into its context, to select different expert behaviors than what have been previously selected.<n>We demonstrate the effectiveness of our method in both simulated locomotion and manipulation settings, as well as on real-world robotic manipulation tasks.
- Score: 53.92981562916783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing autonomous agents that quickly explore an environment and adapt their behavior online is a canonical challenge in robotics and machine learning. While humans are able to achieve such fast online exploration and adaptation, often acquiring new information and skills in only a handful of interactions, existing algorithmic approaches tend to rely on random exploration and slow, gradient-based behavior updates. How can we endow autonomous agents with such capabilities on par with humans? Taking inspiration from recent progress on both in-context learning and large-scale behavioral cloning, in this work we propose behavioral exploration: training agents to internalize what it means to explore and adapt in-context over the space of ``expert'' behaviors. To achieve this, given access to a dataset of expert demonstrations, we train a long-context generative model to predict expert actions conditioned on a context of past observations and a measure of how ``exploratory'' the expert's behaviors are relative to this context. This enables the model to not only mimic the behavior of an expert, but also, by feeding its past history of interactions into its context, to select different expert behaviors than what have been previously selected, thereby allowing for fast online adaptation and targeted, ``expert-like'' exploration. We demonstrate the effectiveness of our method in both simulated locomotion and manipulation settings, as well as on real-world robotic manipulation tasks, illustrating its ability to learn adaptive, exploratory behavior.
Related papers
- Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach [23.52028824411467]
We present a large-scale experimental study involving numepisodes navigation episodes in a real environment with a physical robot.<n>We analyze the type of reasoning emerging from end-to-end training.<n>We show in a post-hoc analysis that the value function learned by the agent relates to long-term planning.
arXiv Detail & Related papers (2025-03-11T11:16:47Z) - Life, uh, Finds a Way: Systematic Neural Search [2.163881720692685]
We tackle the challenge of rapidly adapting an agent's behavior to solve continuous problems in settings.
Instead of focusing on deep reinforcement learning, we propose viewing behavior as the physical manifestation of a search procedure.
We describe an algorithm that implicitly enumerates behaviors by regulating the tight feedback loop between execution of behaviors and mutation of the graph.
arXiv Detail & Related papers (2024-10-02T09:06:54Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Leveraging Human Feedback to Evolve and Discover Novel Emergent
Behaviors in Robot Swarms [14.404339094377319]
We seek to leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system.
Our proposed approach adapts to user preferences by learning a similarity space over swarm collective behaviors.
We test our approach in simulation on two robot capability models and show that our methods consistently discover a richer set of emergent behaviors than prior work.
arXiv Detail & Related papers (2023-04-25T15:18:06Z) - Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination.
Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model.
Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z) - Chain of Thought Imitation with Procedure Cloning [129.62135987416164]
We propose procedure cloning, which applies supervised sequence prediction to imitate the series of expert computations.
We show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations.
arXiv Detail & Related papers (2022-05-22T13:14:09Z) - Learning Complex Spatial Behaviours in ABM: An Experimental
Observational Study [0.0]
This paper explores how Reinforcement Learning can be applied to create emergent agent behaviours.
Running a series of simulations, we demonstrate that agents trained using the novel Proximal Policy optimisation algorithm behave in ways that exhibit properties of real-world intelligent adaptive behaviours.
arXiv Detail & Related papers (2022-01-04T11:56:11Z) - Hierarchical Affordance Discovery using Intrinsic Motivation [69.9674326582747]
We propose an algorithm using intrinsic motivation to guide the learning of affordances for a mobile robot.
This algorithm is capable to autonomously discover, learn and adapt interrelated affordances without pre-programmed actions.
Once learned, these affordances may be used by the algorithm to plan sequences of actions in order to perform tasks of various difficulties.
arXiv Detail & Related papers (2020-09-23T07:18:21Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.