Related papers: Visual Language Model as a Judge for Object Detection in Industrial Diagrams

Visual Language Model as a Judge for Object Detection in Industrial Diagrams

URL: http://arxiv.org/abs/2510.03376v1
Date: Fri, 03 Oct 2025 13:52:09 GMT
Title: Visual Language Model as a Judge for Object Detection in Industrial Diagrams
Authors: Sanjukta Ghosh,
Abstract summary: This paper introduces a framework that employs Visual Language Models (VLMs) to assess object detection results and guide their refinement.<n>The approach exploits the multimodal capabilities of VLMs to identify missing or inconsistent detections, thereby enabling automated quality assessment and improving overall detection performance on complex industrial diagrams.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Industrial diagrams such as piping and instrumentation diagrams (P&IDs) are essential for the design, operation, and maintenance of industrial plants. Converting these diagrams into digital form is an important step toward building digital twins and enabling intelligent industrial automation. A central challenge in this digitalization process is accurate object detection. Although recent advances have significantly improved object detection algorithms, there remains a lack of methods to automatically evaluate the quality of their outputs. This paper addresses this gap by introducing a framework that employs Visual Language Models (VLMs) to assess object detection results and guide their refinement. The approach exploits the multimodal capabilities of VLMs to identify missing or inconsistent detections, thereby enabling automated quality assessment and improving overall detection performance on complex industrial diagrams.

Related papers

NeRF-Based defect detection [6.72800891299482]
This paper introduces an automated defect detection framework built on Neural Radiance Fields (NeRF) and the concept of digital twins.<n>The system utilizes UAVs to capture images and reconstruct 3D models of machinery, producing both a standard reference model and a current-state model for comparison.
arXiv Detail & Related papers (2025-03-31T22:27:51Z)
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning [67.72413262980272]
Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear.<n>We develop SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck.<n>Our approach achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations.
arXiv Detail & Related papers (2025-03-10T06:18:31Z)
Anomaly Detection for Industrial Applications, Its Challenges, Solutions, and Future Directions: A Review [4.139740414165092]
Anomaly detection from images captured using camera sensors is one of the mainstream applications at the industrial level.<n>Traditional anomaly detection workflow is based on a manual inspection by human operators.<n>Recent vision-based approaches can automatically extract, process, and interpret features using computer vision.
arXiv Detail & Related papers (2025-01-20T07:24:39Z)
Accelerating Manufacturing Scale-Up from Material Discovery Using Agentic Web Navigation and Retrieval-Augmented AI for Process Engineering Schematics Design [2.368662284133926]
Process Flow Diagrams (PFDs) and Process and Instrumentation Diagrams (PIDs) are critical tools for industrial process design, control, and safety.<n>The generation of precise and regulation-compliant diagrams remains a significant challenge, particularly in scaling breakthroughs from material discovery to industrial production in an era of automation and digitalization.<n>This paper introduces an autonomous agentic framework to address these challenges through a twostage approach involving knowledge acquisition and generation.
arXiv Detail & Related papers (2024-12-08T13:36:42Z)
Uncertainty Estimation for 3D Object Detection via Evidential Learning [63.61283174146648]
We introduce a framework for quantifying uncertainty in 3D object detection by leveraging an evidential learning loss on Bird's Eye View representations in the 3D detector. We demonstrate both the efficacy and importance of these uncertainty estimates on identifying out-of-distribution scenes, poorly localized objects, and missing (false negative) detections.
arXiv Detail & Related papers (2024-10-31T13:13:32Z)
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z)
Unsupervised Domain Adaption of Object Detectors: A Survey [87.08473838767235]
Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications. Learning highly accurate models relies on the availability of datasets with a large number of annotated images. Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images.
arXiv Detail & Related papers (2021-05-27T23:34:06Z)
Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD. Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z)
Cognitive Visual Inspection Service for LCD Manufacturing Industry [80.63336968475889]
This paper discloses a novel visual inspection system for liquid crystal display (LCD), which is currently a dominant type in the FPD industry. System is based on two cornerstones: robust/high-performance defect recognition model and cognitive visual inspection service architecture.
arXiv Detail & Related papers (2021-01-11T08:14:35Z)
Industrial object, machine part and defect recognition towards fully automated industrial monitoring employing deep learning. The case of multilevel VGG19 [0.0]
Modern industry requires modern solutions for monitoring the automatic production of goods. We propose a modified version of the Virtual Geometry Group (VGG) network, called Multipath VGG19, which allows for more local and global feature extraction. Specifically, top classification performance was achieved in five of the six image datasets, while the average classification improvement was 6.95%.
arXiv Detail & Related papers (2020-11-23T10:05:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.