Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
- URL: http://arxiv.org/abs/2501.09267v2
- Date: Sat, 12 Apr 2025 20:48:26 GMT
- Title: Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
- Authors: Abdalwhab Abdalwhab, Ali Imran, Sina Heydarian, Ivanka Iordanova, David St-Onge,
- Abstract summary: Ground robots equipped with advanced vision systems could automate tasks such as monitoring mechanical, electrical, and plumbing (MEP) systems.<n>The present research evaluates the applicability of open-vocabulary vision-language models compared to fine-tuned, lightweight, closed-set object detectors.
- Score: 3.053513975262358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The construction industry has long explored robotics and computer vision, yet their deployment on construction sites remains very limited. These technologies have the potential to revolutionize traditional workflows by enhancing accuracy, efficiency, and safety in construction management. Ground robots equipped with advanced vision systems could automate tasks such as monitoring mechanical, electrical, and plumbing (MEP) systems. The present research evaluates the applicability of open-vocabulary vision-language models compared to fine-tuned, lightweight, closed-set object detectors for detecting MEP components using a mobile ground robotic platform. A dataset collected with cameras mounted on a ground robot was manually annotated and analyzed to compare model performance. The results demonstrate that, despite the versatility of vision-language models, fine-tuned lightweight models still largely outperform them in specialized environments and for domain-specific tasks.
Related papers
- Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research [6.793869699081147]
This review explores the potential of foundation models to advance laboratory automation in the materials and chemical sciences.<n>It emphasizes the dual roles of these models: cognitive functions for experimental planning and data analysis, and physical functions for hardware operations.<n>Recent advancements have demonstrated the feasibility of using large language models (LLMs) and multimodal robotic systems to handle complex and dynamic laboratory tasks.
arXiv Detail & Related papers (2025-06-14T02:22:28Z) - Is Single-View Mesh Reconstruction Ready for Robotics? [63.29645501232935]
This paper evaluates single-view mesh reconstruction models for creating digital twin environments in robot manipulation.<n>We establish benchmarking criteria for 3D reconstruction in robotics contexts.<n>Despite success on computer vision benchmarks, existing approaches fail to meet robotics-specific requirements.
arXiv Detail & Related papers (2025-05-23T14:35:56Z) - Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities [65.98704516122228]
The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments.<n>This paper addresses the deployment of FM-enabled robots in the field, where missions often require a robot to operate in large-scale and unstructured environments.<n>We present the first demonstration of large-scale LLM-enabled robot planning in unstructured environments with several kilometers of missions.
arXiv Detail & Related papers (2025-05-14T15:28:43Z) - Multi-Agent Systems for Robotic Autonomy with LLMs [7.113794752528622]
The framework includes three core agents: Task Analyst, Robot Designer, and Reinforcement Learning Designer.<n>Results demonstrate that the proposed system can design feasible robots with control strategies when appropriate task inputs are provided.
arXiv Detail & Related papers (2025-05-09T03:52:37Z) - M2R2: MulitModal Robotic Representation for Temporal Action Segmentation [9.64001633229156]
We introduce a novel pretraining strategy that enables the reuse of learned features across multiple TAS models.
Our method achieves state-of-the-art performance on the REASSEMBLE dataset, outperforming existing robotic action segmentation models by 46.6%.
arXiv Detail & Related papers (2025-04-25T19:36:17Z) - An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework [49.633199780510864]
This work proposes a multi-agent autonomous mechatronics design framework, integrating expertise across mechanical design, optimization, electronics, and software engineering.
operating primarily through a language-driven workflow, the framework incorporates structured human feedback to ensure robust performance under real-world constraints.
A fully functional autonomous vessel was developed with optimized propulsion, cost-effective electronics, and advanced control.
arXiv Detail & Related papers (2025-04-20T16:57:45Z) - On the Exploration of LM-Based Soft Modular Robot Design [26.847859137653487]
Large language models (LLMs) have demonstrated promising capabilities in modeling real-world knowledge.
In this paper, we explore the potential of using LLMs to aid in the design of soft modular robots.
Our model performs well in evaluations for designing soft modular robots with uni- and bi-directional and stair-descending capabilities.
arXiv Detail & Related papers (2024-11-01T04:03:05Z) - Tiny Robotics Dataset and Benchmark for Continual Object Detection [6.4036245876073234]
This work introduces a novel benchmark to evaluate the continual learning capabilities of object detection systems in tiny robotic platforms.
Our contributions include: (i) Tiny Robotics Object Detection (TiROD), a comprehensive dataset collected using a small mobile robot, designed to test the adaptability of object detectors across various domains and classes; (ii) an evaluation of state-of-the-art real-time object detectors combined with different continual learning strategies on this dataset; and (iii) we publish the data and the code to replicate the results to foster continuous advancements in this field.
arXiv Detail & Related papers (2024-09-24T16:21:27Z) - Foundation Models for Autonomous Robots in Unstructured Environments [15.517532442044962]
The study systematically reviews application of foundation models in two field of robotic and unstructured environment.
Findings showed that linguistic capabilities of LLMs have been utilized more than other features for improving perception in human-robot interactions.
The use of LLMs demonstrated more applications in project management and safety in construction, and natural hazard detection in disaster management.
arXiv Detail & Related papers (2024-07-19T13:26:52Z) - LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Embodied Intelligence System [22.779285672925425]
Embodied intelligence (EI) enables manufacturing systems to flexibly perceive, reason, adapt, and operate within dynamic shop floor environments.<n>We propose LAECIPS, a large vision model-assisted adaptive edge-cloud collaboration framework for IoT-based embodied intelligence systems.<n>LAECIPS decouples large vision models in the cloud from lightweight models on the edge, enabling plug-and-play model adaptation and continual learning.
arXiv Detail & Related papers (2024-04-16T12:12:06Z) - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis [82.59451639072073]
General-purpose robots operate seamlessly in any environment, with any object, and utilize various skills to complete diverse tasks.
As a community, we have been constraining most robotic systems by designing them for specific tasks, training them on specific datasets, and deploying them within specific environments.
Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models, we devote this survey to exploring how foundation models can be applied to general-purpose robotics.
arXiv Detail & Related papers (2023-12-14T10:02:55Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language
Models [4.4173427917548524]
Multimodal Large Language Models (MLLMs) have emerged as novel backbones for various downstream tasks.
We introduce the RoboLLM framework, equipped with a BEiT-3 backbone, to address all visual perception tasks in the ARMBench challenge.
arXiv Detail & Related papers (2023-10-16T09:30:45Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.