Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems
- URL: http://arxiv.org/abs/2601.14091v1
- Date: Tue, 20 Jan 2026 15:54:33 GMT
- Title: Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems
- Authors: Hossein Naderi, Alireza Shojaei, Lifu Huang, Philip Agee, Kereshmeh Afsari, Abiola Akanmu,
- Abstract summary: This study explores the potential of foundation models to enhance the adaptability and generalizability of task planning in construction robots.<n>Four models are proposed and implemented using lightweight, open-source large language models (LLMs) and vision language models (VLMs)<n>Results show that the four-agent team outperforms the state-of-the-art GPT-4o in most metrics while being ten times more cost-effective.
- Score: 23.277959576616407
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Robots are expected to play a major role in the future construction industry but face challenges due to high costs and difficulty adapting to dynamic tasks. This study explores the potential of foundation models to enhance the adaptability and generalizability of task planning in construction robots. Four models are proposed and implemented using lightweight, open-source large language models (LLMs) and vision language models (VLMs). These models include one single agent and three multi-agent teams that collaborate to create robot action plans. The models are evaluated across three construction roles: Painter, Safety Inspector, and Floor Tiling. Results show that the four-agent team outperforms the state-of-the-art GPT-4o in most metrics while being ten times more cost-effective. Additionally, teams with three and four agents demonstrate the improved generalizability. By discussing how agent behaviors influence outputs, this study enhances the understanding of AI teams and supports future research in diverse unstructured environments beyond construction.
Related papers
- Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models [78.73992315826035]
We introduce Youtu-LLM, a lightweight language model that harmonizes high computational efficiency with native agentic intelligence.<n>Youtu-LLM is pre-trained from scratch to systematically cultivate reasoning and planning capabilities.
arXiv Detail & Related papers (2025-12-31T04:25:11Z) - Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction [0.4786416643636131]
Foundation models, including large language models (LLMs) and vision-language models (VLMs) have recently enabled novel approaches to robot autonomy and human-robot interfaces.<n>In parallel, vision-language-action models (VLAs) or large behavior models (LBMs) are increasing the dexterity and capabilities of robotic systems.
arXiv Detail & Related papers (2025-08-07T11:48:03Z) - Evaluation of Habitat Robotics using Large Language Models [0.1333283959406959]
We evaluate the effectiveness of Large Language Models at solving embodied robotic tasks using the Meta PARTNER benchmark.<n>Our results indicate that reasoning models like OpenAI o3-mini outperform non-reasoning models like OpenAI GPT-4o and Llama 3.
arXiv Detail & Related papers (2025-07-08T16:39:39Z) - Multi-Agent Systems for Robotic Autonomy with LLMs [7.113794752528622]
The framework includes three core agents: Task Analyst, Robot Designer, and Reinforcement Learning Designer.<n>Results demonstrate that the proposed system can design feasible robots with control strategies when appropriate task inputs are provided.
arXiv Detail & Related papers (2025-05-09T03:52:37Z) - REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation [57.628771707989166]
We propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution.<n>ReMAC incorporates two key modules: a self-reflection module performing pre-conditions and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning.
arXiv Detail & Related papers (2025-03-28T03:51:40Z) - MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models [34.138699712315]
This paper introduces a novel vision--action (VLA) model, mixture of robotic experts (MoRE) for quadruped robots.<n>MoRE integrates multiple low-rank adaptation modules as distinct experts within a dense multi-modal large language model.<n>Experiments demonstrate that MoRE outperforms all baselines across six different skills and exhibits superior generalization capabilities in out-of-distribution scenarios.
arXiv Detail & Related papers (2025-03-11T03:13:45Z) - $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents [33.77674812074215]
We introduce a novel multi-agent framework designed to enable effective collaboration among heterogeneous robots.<n>We propose a self-prompted approach, where agents comprehend robot URDF files and call robot kinematics tools to generate descriptions of their physics capabilities.<n>The Habitat-MAS benchmark is designed to assess how a multi-agent framework handles tasks that require embodiment-aware reasoning.
arXiv Detail & Related papers (2024-10-30T03:20:01Z) - QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds [51.05639500325598]
We introduce QuadrupedGPT, designed to follow diverse commands with agility comparable to that of a pet.<n>Our agent shows proficiency in handling diverse tasks and intricate instructions, representing a significant step toward the development of versatile quadruped agents.
arXiv Detail & Related papers (2024-06-24T12:14:24Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - RoboAgent: Generalization and Efficiency in Robot Manipulation via
Semantic Augmentations and Action Chunking [54.776890150458385]
We develop an efficient system for training universal agents capable of multi-task manipulation skills.
We are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks.
On average, RoboAgent outperforms prior methods by over 40% in unseen situations.
arXiv Detail & Related papers (2023-09-05T03:14:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.