Adaptive Language-Guided Abstraction from Contrastive Explanations
- URL: http://arxiv.org/abs/2409.08212v2
- Date: Fri, 13 Sep 2024 21:24:48 GMT
- Title: Adaptive Language-Guided Abstraction from Contrastive Explanations
- Authors: Andi Peng, Belinda Z. Li, Ilia Sucholutsky, Nishanth Kumar, Julie A. Shah, Jacob Andreas, Andreea Bobu,
- Abstract summary: It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward.
End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features.
This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
- Score: 53.48583372522492
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many approaches to robot learning begin by inferring a reward function from a set of human demonstrations. To learn a good reward, it is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. End-to-end methods for joint feature and reward learning (e.g., using deep networks or program synthesis techniques) often yield brittle reward functions that are sensitive to spurious state features. By contrast, humans can often generalizably learn from a small number of demonstrations by incorporating strong priors about what features of a demonstration are likely meaningful for a task of interest. How do we build robots that leverage this kind of background knowledge when learning from new demonstrations? This paper describes a method named ALGAE (Adaptive Language-Guided Abstraction from [Contrastive] Explanations) which alternates between using language models to iteratively identify human-meaningful features needed to explain demonstrated behavior, then standard inverse reinforcement learning techniques to assign weights to these features. Experiments across a variety of both simulated and real-world robot environments show that ALGAE learns generalizable reward functions defined on interpretable features using only small numbers of demonstrations. Importantly, ALGAE can recognize when features are missing, then extract and define those features without any human input -- making it possible to quickly and efficiently acquire rich representations of user behavior.
Related papers
- Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models [55.19497659895122]
We introduce methods for discovering and applying sparse feature circuits.
These are causally implicatedworks of human-interpretable features for explaining language model behaviors.
arXiv Detail & Related papers (2024-03-28T17:56:07Z) - A Generalized Acquisition Function for Preference-based Reward Learning [12.158619866176487]
Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task.
Previous works have shown that actively synthesizing preference queries to maximize information gain about the reward function parameters improves data efficiency.
We show that it is possible to optimize for learning the reward function up to a behavioral equivalence class, such as inducing the same ranking over behaviors, distribution over choices, or other related definitions of what makes two rewards similar.
arXiv Detail & Related papers (2024-03-09T20:32:17Z) - FIND: A Function Description Benchmark for Evaluating Interpretability
Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods.
FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.
We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Inducing Structure in Reward Learning by Learning Features [31.413656752926208]
We introduce a novel type of human input for teaching features and an algorithm that utilizes it to learn complex features from the raw state space.
We demonstrate our method in settings where all features have to be learned from scratch, as well as where some of the features are known.
arXiv Detail & Related papers (2022-01-18T16:02:29Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Feature Expansive Reward Learning: Rethinking Human Input [31.413656752926208]
We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not.
We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function.
arXiv Detail & Related papers (2020-06-23T17:59:34Z) - Active Preference-Based Gaussian Process Regression for Reward Learning [42.697198807877925]
One common approach is to learn reward functions from collected expert demonstrations.
We present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories.
Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework.
arXiv Detail & Related papers (2020-05-06T03:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.