LLMs for Robotic Object Disambiguation
- URL: http://arxiv.org/abs/2401.03388v1
- Date: Sun, 7 Jan 2024 04:46:23 GMT
- Title: LLMs for Robotic Object Disambiguation
- Authors: Connie Jiang, Yiqing Xu, David Hsu
- Abstract summary: Our study reveals the LLM's aptitude for solving complex decision making challenges.
A pivotal focus of our research is the object disambiguation capability of LLMs.
We have developed a few-shot prompt engineering system to improve the LLM's ability to pose disambiguating queries.
- Score: 21.101902684740796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advantages of pre-trained large language models (LLMs) are apparent in a
variety of language processing tasks. But can a language model's knowledge be
further harnessed to effectively disambiguate objects and navigate
decision-making challenges within the realm of robotics? Our study reveals the
LLM's aptitude for solving complex decision making challenges that are often
previously modeled by Partially Observable Markov Decision Processes (POMDPs).
A pivotal focus of our research is the object disambiguation capability of
LLMs. We detail the integration of an LLM into a tabletop environment
disambiguation task, a decision making problem where the robot's task is to
discern and retrieve a user's desired object from an arbitrarily large and
complex cluster of objects. Despite multiple query attempts with zero-shot
prompt engineering (details can be found in the Appendix), the LLM struggled to
inquire about features not explicitly provided in the scene description. In
response, we have developed a few-shot prompt engineering system to improve the
LLM's ability to pose disambiguating queries. The result is a model capable of
both using given features when they are available and inferring new relevant
features when necessary, to successfully generate and navigate down a precise
decision tree to the correct object--even when faced with identical options.
Related papers
- UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models [0.42832989850721054]
Multimodal Entities Linking (MEL) is a crucial task that aims at linking ambiguous mentions within multimodal contexts to referent entities in a multimodal knowledge base, such as Wikipedia.
Existing methods overcomplicate the MEL task and overlook the visual semantic information, which makes them costly and hard to scale.
We propose UniMEL, a unified framework which establishes a new paradigm to process multimodal entity linking tasks using Large Language Models.
arXiv Detail & Related papers (2024-07-23T03:58:08Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation [0.979851640406258]
This work focuses on solving the object goal navigation problem by mimicking human cognition.
We introduce a comprehensive framework capable of exploring an unfamiliar environment in search of an object.
A challenging task in using LLMs to generate high level sub-goals is to efficiently represent the environment around the robot.
arXiv Detail & Related papers (2024-03-30T10:54:59Z) - Unmemorization in Large Language Models via Self-Distillation and
Deliberate Imagination [58.36408867180233]
Large Language Models (LLMs) struggle with crucial issues of privacy violation and unwanted exposure of sensitive data.
We introduce a novel approach termed deliberate imagination in the context of LLM unlearning.
Our results demonstrate the usefulness of this approach across different models and sizes, and also with parameter-efficient fine-tuning.
arXiv Detail & Related papers (2024-02-15T16:21:14Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - LanguageMPC: Large Language Models as Decision Makers for Autonomous
Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios.
Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z) - Selective Perception: Optimizing State Descriptions with Reinforcement
Learning for Language Model Actors [40.18762220245365]
Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games.
Previous work does little to explore what environment state information is provided to LLM actors via language.
We propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions.
arXiv Detail & Related papers (2023-07-21T22:02:50Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.