Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation
- URL: http://arxiv.org/abs/2504.09532v1
- Date: Sun, 13 Apr 2025 11:37:32 GMT
- Title: Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation
- Authors: Yu Hao, Geeta Chandra Raju Bethala, Niraj Pudasaini, Hao Huang, Shuaihang Yuan, Congcong Wen, Baoru Huang, Anh Nguyen, Yi Fang,
- Abstract summary: We introduce a novel framework that applies the embodied chain of action reasoning methodology to autonomously plan actions from textual instructions for humanoid loco-manipulation.<n>Our method integrates humanoid-specific chain of thought methodology, including detailed affordance and body movement analysis, which provides a breakdown of the task into a sequence of locomotion and manipulation actions.<n>We evaluate our method's efficacy on the decoupled upper and lower body control and demonstrate the effectiveness of the chain of robotic action reasoning strategies in comprehending human instructions.
- Score: 17.449836528972867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enabling humanoid robots to autonomously perform loco-manipulation tasks in complex, unstructured environments poses significant challenges. This entails equipping robots with the capability to plan actions over extended horizons while leveraging multi-modality to bridge gaps between high-level planning and actual task execution. Recent advancements in multi-modal foundation models have showcased substantial potential in enhancing planning and reasoning abilities, particularly in the comprehension and processing of semantic information for robotic control tasks. In this paper, we introduce a novel framework based on foundation models that applies the embodied chain of action reasoning methodology to autonomously plan actions from textual instructions for humanoid loco-manipulation. Our method integrates humanoid-specific chain of thought methodology, including detailed affordance and body movement analysis, which provides a breakdown of the task into a sequence of locomotion and manipulation actions. Moreover, we incorporate spatial reasoning based on the observation and target object properties to effectively navigate where target position may be unseen or occluded. Through rigorous experimental setups on object rearrangement, manipulations and loco-manipulation tasks on a real-world environment, we evaluate our method's efficacy on the decoupled upper and lower body control and demonstrate the effectiveness of the chain of robotic action reasoning strategies in comprehending human instructions.
Related papers
- Learning Coordinated Bimanual Manipulation Policies using State Diffusion and Inverse Dynamics Models [22.826115023573205]
We infuse the predictive nature of human manipulation strategies into robot imitation learning.<n>We train a diffusion model to predict future states and compute robot actions that achieve the predicted states.<n>Our framework consistently outperforms state-of-the-art state-to-action mapping policies.
arXiv Detail & Related papers (2025-03-30T01:25:35Z) - CAIMAN: Causal Action Influence Detection for Sample Efficient Loco-manipulation [17.94272840532448]
We present CAIMAN, a novel framework for learning loco-manipulation that relies solely on sparse task rewards.<n>We employ a hierarchical control strategy, combining a low-level locomotion policy with a high-level policy that prioritizes task-relevant velocity commands.<n>We demonstrate the framework's superior sample efficiency, adaptability to diverse environments, and successful transfer to hardware without fine-tuning.
arXiv Detail & Related papers (2025-02-02T16:16:53Z) - Grounding Language Models in Autonomous Loco-manipulation Tasks [3.8363685417355557]
We propose a novel framework that learns, selects, and plans behaviors based on tasks in different scenarios.
We leverage the planning and reasoning features of the large language model (LLM), constructing a hierarchical task graph.
Experiments in simulation and real-world using the CENTAURO robot show that the language model based planner can efficiently adapt to new loco-manipulation tasks.
arXiv Detail & Related papers (2024-09-02T15:27:48Z) - Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model [6.9268843428933025]
Large language models (LLMs) have demonstrated powerful planning and reasoning capabilities for comprehension and processing of semantic information.
We propose a novel language-model based framework that enables robots to autonomously plan behaviors and low-level execution under given textual instructions.
arXiv Detail & Related papers (2024-08-15T17:33:32Z) - Embodied Instruction Following in Unknown Environments [66.60163202450954]
We propose an embodied instruction following (EIF) method for complex tasks in the unknown environment.
We build a hierarchical embodied instruction following framework including the high-level task planner and the low-level exploration controller.
For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.
arXiv Detail & Related papers (2024-06-17T17:55:40Z) - Unified Task and Motion Planning using Object-centric Abstractions of
Motion Constraints [56.283944756315066]
We propose an alternative TAMP approach that unifies task and motion planning into a single search.
Our approach is based on an object-centric abstraction of motion constraints that permits leveraging the computational efficiency of off-the-shelf AI search to yield physically feasible plans.
arXiv Detail & Related papers (2023-12-29T14:00:20Z) - ThinkBot: Embodied Instruction Following with Thought Chain Reasoning [66.09880459084901]
Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.
We propose ThinkBot that reasons the thought chain in human instruction to recover the missing action descriptions.
Our ThinkBot outperforms the state-of-the-art EIF methods by a sizable margin in both success rate and execution efficiency.
arXiv Detail & Related papers (2023-12-12T08:30:09Z) - Hierarchical generative modelling for autonomous robots [8.023920215148486]
We show how a humanoid robot can autonomously complete a complex task that requires a holistic use of locomotion, manipulation, and grasping.
Specifically, we demonstrate the ability of a humanoid robot that can retrieve and transport a box, open and walk through a door to reach the destination, approach and kick a football, while showing robust performance in presence of body damage and ground irregularities.
arXiv Detail & Related papers (2023-08-15T13:51:03Z) - Dexterous Manipulation from Images: Autonomous Real-World RL via Substep
Guidance [71.36749876465618]
We describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks.
Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples.
experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world.
arXiv Detail & Related papers (2022-12-19T22:50:40Z) - Active Exploration for Robotic Manipulation [40.39182660794481]
This paper proposes a model-based active exploration approach that enables efficient learning in sparse-reward robotic manipulation tasks.
We evaluate our proposed algorithm in simulation and on a real robot, trained from scratch with our method.
arXiv Detail & Related papers (2022-10-23T18:07:51Z) - Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning [114.1830997893756]
This work focuses on learning a model to plan goal-directed actions in real-life videos.
We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
arXiv Detail & Related papers (2021-10-05T01:06:53Z) - Bottom-Up Skill Discovery from Unsegmented Demonstrations for
Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery.
We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations.
Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.