Evaluation of Habitat Robotics using Large Language Models
- URL: http://arxiv.org/abs/2507.06157v1
- Date: Tue, 08 Jul 2025 16:39:39 GMT
- Title: Evaluation of Habitat Robotics using Large Language Models
- Authors: William Li, Lei Hamilton, Kaise Al-natour, Sanjeev Mohindra,
- Abstract summary: We evaluate the effectiveness of Large Language Models at solving embodied robotic tasks using the Meta PARTNER benchmark.<n>Our results indicate that reasoning models like OpenAI o3-mini outperform non-reasoning models like OpenAI GPT-4o and Llama 3.
- Score: 0.1333283959406959
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper focuses on evaluating the effectiveness of Large Language Models at solving embodied robotic tasks using the Meta PARTNER benchmark. Meta PARTNR provides simplified environments and robotic interactions within randomized indoor kitchen scenes. Each randomized kitchen scene is given a task where two robotic agents cooperatively work together to solve the task. We evaluated multiple frontier models on Meta PARTNER environments. Our results indicate that reasoning models like OpenAI o3-mini outperform non-reasoning models like OpenAI GPT-4o and Llama 3 when operating in PARTNR's robotic embodied environments. o3-mini displayed outperform across centralized, decentralized, full observability, and partial observability configurations. This provides a promising avenue of research for embodied robotic development.
Related papers
- Is Single-View Mesh Reconstruction Ready for Robotics? [63.29645501232935]
This paper evaluates single-view mesh reconstruction models for creating digital twin environments in robot manipulation.<n>We establish benchmarking criteria for 3D reconstruction in robotics contexts.<n>Despite success on computer vision benchmarks, existing approaches fail to meet robotics-specific requirements.
arXiv Detail & Related papers (2025-05-23T14:35:56Z) - PointArena: Probing Multimodal Grounding Through Language-Guided Pointing [79.80132157576978]
Pointing serves as a fundamental and intuitive mechanism for grounding language within visual contexts.<n>We introduce PointArena, a comprehensive platform for evaluating multimodal pointing across diverse reasoning scenarios.
arXiv Detail & Related papers (2025-05-15T06:04:42Z) - RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z) - M2R2: MulitModal Robotic Representation for Temporal Action Segmentation [9.64001633229156]
We introduce a novel pretraining strategy that enables the reuse of learned features across multiple TAS models.<n>Our method achieves state-of-the-art performance on the REASSEMBLE dataset, outperforming existing robotic action segmentation models by 46.6%.
arXiv Detail & Related papers (2025-04-25T19:36:17Z) - REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation [57.628771707989166]
We propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution.<n>ReMAC incorporates two key modules: a self-reflection module performing pre-conditions and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning.
arXiv Detail & Related papers (2025-03-28T03:51:40Z) - GR00T N1: An Open Foundation Model for Generalist Humanoid Robots [133.23509142762356]
General-purpose robots need a versatile body and an intelligent mind.<n>Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy.<n>We introduce GR00T N1, an open foundation model for humanoid robots.
arXiv Detail & Related papers (2025-03-18T21:06:21Z) - D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models [0.0]
Detection-Robot Management GPT (D-RMGPT) is a robot-assisted assembly planner based on Large Multimodal Models (LMM)
It can assist inexperienced operators in assembly tasks without requiring any markers or previous training.
It achieves an assembly success rate of 83% while reducing the assembly time for inexperienced operators by 33% compared to the manual process.
arXiv Detail & Related papers (2024-08-21T16:34:21Z) - Wonderful Team: Zero-Shot Physical Task Planning with Visual LLMs [0.0]
Wonderful Team is a framework for executing high-level robotic planning in a zero-shot regime.<n>We show that Wonderful Team's performance on real-world semantic and physical planning tasks often exceeds methods that rely on separate vision systems.
arXiv Detail & Related papers (2024-07-26T21:18:57Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Distributed Reinforcement Learning for Robot Teams: A Review [10.92709534981466]
Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems.
Community has leveraged model-free multi-agent reinforcement learning to devise efficient, scalable controllers for multi-robot systems.
Recent findings: Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability.
arXiv Detail & Related papers (2022-04-07T15:34:19Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.