Robots That Ask For Help: Uncertainty Alignment for Large Language Model
Planners
- URL: http://arxiv.org/abs/2307.01928v2
- Date: Mon, 4 Sep 2023 16:06:48 GMT
- Title: Robots That Ask For Help: Uncertainty Alignment for Large Language Model
Planners
- Authors: Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen
Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu,
Dorsa Sadigh, Andy Zeng, Anirudha Majumdar
- Abstract summary: KnowNo is a framework for measuring and aligning the uncertainty of large language models.
KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion.
- Score: 85.03486419424647
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) exhibit a wide range of promising capabilities
-- from step-by-step planning to commonsense reasoning -- that may provide
utility for robots, but remain prone to confidently hallucinated predictions.
In this work, we present KnowNo, which is a framework for measuring and
aligning the uncertainty of LLM-based planners such that they know when they
don't know and ask for help when needed. KnowNo builds on the theory of
conformal prediction to provide statistical guarantees on task completion while
minimizing human help in complex multi-step planning settings. Experiments
across a variety of simulated and real robot setups that involve tasks with
different modes of ambiguity (e.g., from spatial to numeric uncertainties, from
human preferences to Winograd schemas) show that KnowNo performs favorably over
modern baselines (which may involve ensembles or extensive prompt tuning) in
terms of improving efficiency and autonomy, while providing formal assurances.
KnowNo can be used with LLMs out of the box without model-finetuning, and
suggests a promising lightweight approach to modeling uncertainty that can
complement and scale with the growing capabilities of foundation models.
Website: https://robot-help.github.io
Related papers
- Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling [23.62433580021779]
We advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached.
A nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning.
Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference.
arXiv Detail & Related papers (2024-10-02T11:42:49Z) - WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.
WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z) - Large Language Models Must Be Taught to Know What They Don't Know [97.90008709512921]
We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead.
We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
arXiv Detail & Related papers (2024-06-12T16:41:31Z) - Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction [11.614036749291216]
We introduce a new distributed multi-robot planner, S-ATLAS for Safe plAnning for Teams of Language-instructed AgentS, that is capable of achieving user-defined mission success rates.
We show, both theoretically and empirically, that the proposed planner can achieve user-specified task success rates while minimizing the overall number of help requests.
arXiv Detail & Related papers (2024-02-23T15:02:44Z) - Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity [0.659529078336196]
Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions.
LLMs hallucination may result in robots executing plans that are misaligned with user goals or, in extreme cases, unsafe.
This paper explores the concept of introspective planning as a systematic method for guiding LLMs in forming uncertainty-aware plans for robotic task execution.
arXiv Detail & Related papers (2024-02-09T16:40:59Z) - Automated Process Planning Based on a Semantic Capability Model and SMT [50.76251195257306]
In research of manufacturing systems and autonomous robots, the term capability is used for a machine-interpretable specification of a system function.
We present an approach that combines these two topics: starting from a semantic capability model, an AI planning problem is automatically generated.
arXiv Detail & Related papers (2023-12-14T10:37:34Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Grounded Decoding: Guiding Text Generation with Grounded Models for
Embodied Agents [111.15288256221764]
Grounded-decoding project aims to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives.
We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon tasks in a robotic setting by leveraging the knowledge of both models.
arXiv Detail & Related papers (2023-03-01T22:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.