Learning to Look: Seeking Information for Decision Making via Policy Factorization
- URL: http://arxiv.org/abs/2410.18964v1
- Date: Thu, 24 Oct 2024 17:58:11 GMT
- Title: Learning to Look: Seeking Information for Decision Making via Policy Factorization
- Authors: Shivin Dass, Jiaheng Hu, Ben Abbatematteo, Peter Stone, Roberto Martín-Martín,
- Abstract summary: We propose DISaM, a dual-policy solution composed of an information-seeking policy and an information-receiving policy.
We demonstrate the capabilities of our dual policy solution in five manipulation tasks that require information-seeking behaviors.
- Score: 36.87799092971961
- License:
- Abstract: Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the head of the robot to find information relevant to manipulation, or in multi-robot domains, where one scout robot may search for the information that another robot needs to make informed decisions. We identify these tasks with a new type of problem, factorized Contextual Markov Decision Processes, and propose DISaM, a dual-policy solution composed of an information-seeking policy that explores the environment to find the relevant contextual information and an information-receiving policy that exploits the context to achieve the manipulation goal. This factorization allows us to train both policies separately, using the information-receiving one to provide reward to train the information-seeking policy. At test time, the dual agent balances exploration and exploitation based on the uncertainty the manipulation policy has on what the next best action is. We demonstrate the capabilities of our dual policy solution in five manipulation tasks that require information-seeking behaviors, both in simulation and in the real-world, where DISaM significantly outperforms existing methods. More information at https://robin-lab.cs.utexas.edu/learning2look/.
Related papers
- A Survey of Embodied Learning for Object-Centric Robotic Manipulation [27.569063968870868]
Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in AI.
Unlike data-driven machine learning methods, embodied learning focuses on robot learning through physical interaction with the environment.
arXiv Detail & Related papers (2024-08-21T11:32:09Z) - Learning active tactile perception through belief-space control [21.708391958446274]
We propose a method that autonomously learns tactile exploration policies by developing a generative world model.
We evaluate our method on three simulated tasks where the goal is to estimate a desired object property.
We find that our method is able to discover policies that efficiently gather information about the desired property in an intuitive manner.
arXiv Detail & Related papers (2023-11-30T21:54:42Z) - Explaining the Decisions of Deep Policy Networks for Robotic
Manipulations [27.526882375069963]
We present an explicit analysis of deep policy models through input attribution methods to explain how and to what extent each input feature affects the decisions of the robot policy models.
To the best of our knowledge, this is the first report to identify the dynamic changes of input attributions of multi-modal sensor inputs in deep policy networks online for robotic manipulation.
arXiv Detail & Related papers (2023-10-30T10:44:12Z) - AVIS: Autonomous Visual Information Seeking with Large Language Model
Agent [123.75169211547149]
We propose an autonomous information seeking visual question answering framework, AVIS.
Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools.
AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA.
arXiv Detail & Related papers (2023-06-13T20:50:22Z) - Exploration via Planning for Information about the Optimal Trajectory [67.33886176127578]
We develop a method that allows us to plan for exploration while taking the task and the current knowledge into account.
We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines.
arXiv Detail & Related papers (2022-10-06T20:28:55Z) - Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection.
Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior.
Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z) - The Rational Selection of Goal Operations and the Integration ofSearch
Strategies with Goal-Driven Autonomy [3.169249926144497]
Link between cognition and control must manage the problem of converting continuous values from the real world to symbolic representations (and back)
To generate effective behaviors, reasoning must include a capacity to replan, acquire and update new information, detect and respond to anomalies, and perform various operations on system goals.
This paper examines an agent's choices when multiple goal operations co-occur and interact, and it establishes a method of choosing between them.
arXiv Detail & Related papers (2022-01-21T20:53:49Z) - Learning When and What to Ask: a Hierarchical Reinforcement Learning
Framework [17.017688226277834]
We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans.
Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2021-10-14T01:30:36Z) - Towards Coordinated Robot Motions: End-to-End Learning of Motion
Policies on Transform Trees [63.31965375413414]
We propose to solve multi-task problems through learning structured policies from human demonstrations.
Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces.
We derive an end-to-end learning objective function that is suitable for the multi-task problem.
arXiv Detail & Related papers (2020-12-24T22:46:22Z) - Human-in-the-Loop Imitation Learning using Remote Teleoperation [72.2847988686463]
We build a data collection system tailored to 6-DoF manipulation settings.
We develop an algorithm to train the policy iteratively on new data collected by the system.
We demonstrate that agents trained on data collected by our intervention-based system and algorithm outperform agents trained on an equivalent number of samples collected by non-interventional demonstrators.
arXiv Detail & Related papers (2020-12-12T05:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.