Related papers: Edge-Optimized Vision-Language Models for Underground Infrastructure Assessment

Edge-Optimized Vision-Language Models for Underground Infrastructure Assessment

URL: http://arxiv.org/abs/2602.03742v1
Date: Tue, 03 Feb 2026 17:03:46 GMT
Title: Edge-Optimized Vision-Language Models for Underground Infrastructure Assessment
Authors: Johny J. Lopez, Md Meftahul Ferdaus, Mahdi Abdelguerfi,
Abstract summary: This paper presents a novel two-stage pipeline for end-to-end summarization of underground deficiencies.<n>It combines our lightweight RAPID-SCAN segmentation model with a fine-tuned Vision-Language Model deployed on edge computing platform.<n>Our results show the potential of edge-deployable integrated AI systems to bridge the gap between automated defect detection and actionable insights for infrastructure maintenance.
Score: 1.5124107808802705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous inspection of underground infrastructure, such as sewer and culvert systems, is critical to public safety and urban sustainability. Although robotic platforms equipped with visual sensors can efficiently detect structural deficiencies, the automated generation of human-readable summaries from these detections remains a significant challenge, especially on resource-constrained edge devices. This paper presents a novel two-stage pipeline for end-to-end summarization of underground deficiencies, combining our lightweight RAPID-SCAN segmentation model with a fine-tuned Vision-Language Model (VLM) deployed on an edge computing platform. The first stage employs RAPID-SCAN (Resource-Aware Pipeline Inspection and Defect Segmentation using Compact Adaptive Network), achieving 0.834 F1-score with only 0.64M parameters for efficient defect segmentation. The second stage utilizes a fine-tuned Phi-3.5 VLM that generates concise, domain-specific summaries in natural language from the segmentation outputs. We introduce a curated dataset of inspection images with manually verified descriptions for VLM fine-tuning and evaluation. To enable real-time performance, we employ post-training quantization with hardware-specific optimization, achieving significant reductions in model size and inference latency without compromising summarization quality. We deploy and evaluate our complete pipeline on a mobile robotic platform, demonstrating its effectiveness in real-world inspection scenarios. Our results show the potential of edge-deployable integrated AI systems to bridge the gap between automated defect detection and actionable insights for infrastructure maintenance, paving the way for more scalable and autonomous inspection solutions.

Related papers

AI-Based Culvert-Sewer Inspection [0.0]
Culverts and sewer pipes are critical components of drainage systems, and their failure can lead to serious risks to public safety and the environment.<n>This thesis proposes three methods to significantly enhance defect segmentation and handle data scarcity.<n>ForTRESS is a novel architecture that combines depthwise separable convolutions, adaptive Kolmogorov-Arnold Networks (KAN), and multi-scale attention mechanisms.
arXiv Detail & Related papers (2026-01-21T16:33:33Z)
Real-Time Detection and Tracking of Foreign Object Intrusions in Power Systems via Feature-Based Edge Intelligence [4.60587070358843]
This paper presents a novel framework for real-time foreign object intrusion (FOI) detection and tracking in power transmission systems.<n>The framework integrates: (1) a YOLOv7 segmentation model for fast and robust object localization, (2) a ConvNeXt-based feature extractor trained with triplet loss to generate discriminative embeddings, and (3) a feature-assisted IoU tracker.<n>To enable scalable field deployment, the pipeline is optimized for deployment on low-cost edge hardware using mixed-precision inference.
arXiv Detail & Related papers (2025-09-16T17:17:03Z)
Edge-Based Multimodal Sensor Data Fusion with Vision Language Models (VLMs) for Real-time Autonomous Vehicle Accident Avoidance [12.513296074529727]
This paper proposes the Real-time Edge-based Autonomous Co-pilot Trajectory planner (REACT) for autonomous driving.<n>REACT is a V2X-integrated trajectory optimization framework for AD based on a fine-tuned lightweight Vision-Language Model (VLM)<n> evaluated on the DeepAccident benchmark, REACT achieves state-of-the-art performance, a 77% collision rate reduction, a 48.2% Video Panoptic Quality (VPQ), and a 0.57-second inference latency on the Jetson AGX Orin.
arXiv Detail & Related papers (2025-08-01T20:16:04Z)
Towards Edge-Based Idle State Detection in Construction Machinery Using Surveillance Cameras [0.0]
Underused construction machinery leads to increased operational costs and project delays.<n>This paper presents the Edge-IMI framework for detecting idle construction machinery.<n>The proposed solution consists of three components: object detection, tracking, and idle state identification.
arXiv Detail & Related papers (2025-06-01T08:43:33Z)
VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.<n>Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z)
Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation [74.55677741919035]
We propose Prior2Former (P2F), the first approach for segmentation vision transformers rooted in evidential learning.<n>P2F extends the mask vision transformer architecture by incorporating a Beta prior for computing model uncertainty in pixel-wise binary mask assignments.<n>Unlike most segmentation models addressing unknown classes, P2F operates without access to OOD data samples or contrastive training on void (i.e., unlabeled) classes.
arXiv Detail & Related papers (2025-04-07T08:53:14Z)
Efficient Detection Framework Adaptation for Edge Computing: A Plug-and-play Neural Network Toolbox Enabling Edge Deployment [59.61554561979589]
Edge computing has emerged as a key paradigm for deploying deep learning-based object detection in time-sensitive scenarios.<n>Existing edge detection methods face challenges: difficulty balancing detection precision with lightweight models, limited adaptability, and insufficient real-world validation.<n>We propose the Edge Detection Toolbox (ED-TOOLBOX), which utilizes generalizable plug-and-play components to adapt object detection models for edge environments.
arXiv Detail & Related papers (2024-12-24T07:28:10Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion [80.79938369319152]
We design a new pipeline coined PCF-Lift based on our Probabilis-tic Contrastive Fusion (PCF) Our PCF-lift not only significantly outperforms the state-of-the-art methods on widely used benchmarks including the ScanNet dataset and the Messy Room dataset (4.4% improvement of scene-level PQ)
arXiv Detail & Related papers (2024-10-14T16:06:59Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties. Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates. The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.