Related papers: LLMs for Robotic Object Disambiguation

LLMs for Robotic Object Disambiguation

URL: http://arxiv.org/abs/2401.03388v1
Date: Sun, 7 Jan 2024 04:46:23 GMT
Title: LLMs for Robotic Object Disambiguation
Authors: Connie Jiang, Yiqing Xu, David Hsu
Abstract summary: Our study reveals the LLM's aptitude for solving complex decision making challenges. A pivotal focus of our research is the object disambiguation capability of LLMs. We have developed a few-shot prompt engineering system to improve the LLM's ability to pose disambiguating queries.
Score: 21.101902684740796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advantages of pre-trained large language models (LLMs) are apparent in a variety of language processing tasks. But can a language model's knowledge be further harnessed to effectively disambiguate objects and navigate decision-making challenges within the realm of robotics? Our study reveals the LLM's aptitude for solving complex decision making challenges that are often previously modeled by Partially Observable Markov Decision Processes (POMDPs). A pivotal focus of our research is the object disambiguation capability of LLMs. We detail the integration of an LLM into a tabletop environment disambiguation task, a decision making problem where the robot's task is to discern and retrieve a user's desired object from an arbitrarily large and complex cluster of objects. Despite multiple query attempts with zero-shot prompt engineering (details can be found in the Appendix), the LLM struggled to inquire about features not explicitly provided in the scene description. In response, we have developed a few-shot prompt engineering system to improve the LLM's ability to pose disambiguating queries. The result is a model capable of both using given features when they are available and inferring new relevant features when necessary, to successfully generate and navigate down a precise decision tree to the correct object--even when faced with identical options.

Related papers

Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks [0.8425561594225592]
This study introduces a novel framework for training smaller language models in function calling. It focuses on specific logical and mathematical reasoning tasks. The approach aims to improve performances of small-scale models for these tasks using function calling.
arXiv Detail & Related papers (2024-10-24T16:27:35Z)
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making. Existing evaluations tend to rely solely on a final success rate. We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Cognitive Planning for Object Goal Navigation using Generative AI Models [0.979851640406258]
We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies. Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
arXiv Detail & Related papers (2024-03-30T10:54:59Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving [87.1164964709168]
This work employs Large Language Models (LLMs) as a decision-making component for complex autonomous driving scenarios. Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination.
arXiv Detail & Related papers (2023-10-04T17:59:49Z)
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors [40.18762220245365]
Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games. Previous work does little to explore what environment state information is provided to LLM actors via language. We propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions.
arXiv Detail & Related papers (2023-07-21T22:02:50Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.