MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning
- URL: http://arxiv.org/abs/2505.03035v1
- Date: Mon, 05 May 2025 21:26:03 GMT
- Title: MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning
- Authors: Mohammad Mohammadi, Daniel Honerkamp, Martin Büchner, Matteo Cassinelli, Tim Welschehold, Fabien Despinoy, Igor Gilitschenski, Abhinav Valada,
- Abstract summary: MORE is a novel approach for enhancing the capabilities of language models to solve zero-shot mobile manipulation planning tasks.<n>We evaluate MORE on 81 diverse rearrangement tasks from the BEHAVIOR-1K benchmark, where it becomes the first approach to successfully solve a significant share of the benchmark.
- Score: 13.535721260188694
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Autonomous long-horizon mobile manipulation encompasses a multitude of challenges, including scene dynamics, unexplored areas, and error recovery. Recent works have leveraged foundation models for scene-level robotic reasoning and planning. However, the performance of these methods degrades when dealing with a large number of objects and large-scale environments. To address these limitations, we propose MORE, a novel approach for enhancing the capabilities of language models to solve zero-shot mobile manipulation planning for rearrangement tasks. MORE leverages scene graphs to represent environments, incorporates instance differentiation, and introduces an active filtering scheme that extracts task-relevant subgraphs of object and region instances. These steps yield a bounded planning problem, effectively mitigating hallucinations and improving reliability. Additionally, we introduce several enhancements that enable planning across both indoor and outdoor environments. We evaluate MORE on 81 diverse rearrangement tasks from the BEHAVIOR-1K benchmark, where it becomes the first approach to successfully solve a significant share of the benchmark, outperforming recent foundation model-based approaches. Furthermore, we demonstrate the capabilities of our approach in several complex real-world tasks, mimicking everyday activities. We make the code publicly available at https://more-model.cs.uni-freiburg.de.
Related papers
- Unlocking Smarter Device Control: Foresighted Planning with a World Model-Driven Code Execution Approach [83.21177515180564]
We propose a framework that prioritizes natural language understanding and structured reasoning to enhance the agent's global understanding of the environment.<n>Our method outperforms previous approaches, particularly achieving a 44.4% relative improvement in task success rate.
arXiv Detail & Related papers (2025-05-22T09:08:47Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation.
We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents.
We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z) - ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models [38.89166693142495]
ET-Plan-Bench is a benchmark for embodied task planning using Large Language Models (LLMs)<n>It features a controllable and diverse set of embodied tasks varying in different levels of difficulties and complexities.<n>Our benchmark distinguishes itself as a large-scale, quantifiable, highly automated, and fine-grained diagnostic framework.
arXiv Detail & Related papers (2024-10-02T19:56:38Z) - A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements [51.54559117314768]
Task And Motion Planning (TAMP) is the problem of finding a solution to an automated planning problem.
We propose a general and open-source framework for modeling and benchmarking TAMP problems.
We introduce an innovative meta-technique to solve TAMP problems involving moving agents and multiple task-state-dependent obstacles.
arXiv Detail & Related papers (2024-08-11T14:57:57Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - MANER: Multi-Agent Neural Rearrangement Planning of Objects in Cluttered
Environments [8.15681999722805]
This paper proposes a learning-based framework for multi-agent object rearrangement planning.
It addresses the challenges of task sequencing and path planning in complex environments.
arXiv Detail & Related papers (2023-06-10T23:53:28Z) - Effective Baselines for Multiple Object Rearrangement Planning in
Partially Observable Mapped Environments [5.32429768581469]
This paper aims to enable home-assistive intelligent agents to efficiently plan for rearrangement under partial observability.
We investigate monolithic and modular deep reinforcement learning (DRL) methods for planning in our setting.
We find that monolithic DRL methods do not succeed at long-horizon planning needed for multi-object rearrangement.
We also show that our greedy modular agents are empirically optimal when the objects that need to be rearranged are uniformly distributed in the environment.
arXiv Detail & Related papers (2023-01-24T08:03:34Z) - Learning Efficient Abstract Planning Models that Choose What to Predict [28.013014215441505]
We show that existing symbolic operator learning approaches fall short in many robotics domains.
This is primarily because they attempt to learn operators that exactly predict all observed changes in the abstract state.
We propose to learn operators that 'choose what to predict' by only modelling changes necessary for abstract planning to achieve specified goals.
arXiv Detail & Related papers (2022-08-16T13:12:59Z) - Compositional Multi-Object Reinforcement Learning with Linear Relation
Networks [38.59852895970774]
We focus on models that can learn manipulation tasks in fixed multi-object settings and extrapolate this skill zero-shot without any drop in performance when the number of objects changes.
Our approach, which scales linearly in $K$, allows agents to extrapolate and generalize zero-shot to any new object number.
arXiv Detail & Related papers (2022-01-31T17:53:30Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.