A Comprehensive End-to-End Computer Vision Framework for Restoration and
Recognition of Low-Quality Engineering Drawings
- URL: http://arxiv.org/abs/2312.13620v1
- Date: Thu, 21 Dec 2023 07:22:25 GMT
- Title: A Comprehensive End-to-End Computer Vision Framework for Restoration and
Recognition of Low-Quality Engineering Drawings
- Authors: Lvyang Yang, Jiankang Zhang, Huaiqiang Li, Longfei Ren, Chen Yang,
Jingyu Wang, Dongyuan Shi
- Abstract summary: This paper focuses on restoring and recognizing low-quality engineering drawings.
An end-to-end framework is proposed to improve the quality of the drawings and identify the graphical symbols on them.
Experiments on real-world electrical diagrams show that the proposed framework achieves an accuracy of 98.98% and a recall of 99.33%.
- Score: 19.375278164300987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The digitization of engineering drawings is crucial for efficient reuse,
distribution, and archiving. Existing computer vision approaches for digitizing
engineering drawings typically assume the input drawings have high quality.
However, in reality, engineering drawings are often blurred and distorted due
to improper scanning, storage, and transmission, which may jeopardize the
effectiveness of existing approaches. This paper focuses on restoring and
recognizing low-quality engineering drawings, where an end-to-end framework is
proposed to improve the quality of the drawings and identify the graphical
symbols on them. The framework uses K-means clustering to classify different
engineering drawing patches into simple and complex texture patches based on
their gray level co-occurrence matrix statistics. Computer vision operations
and a modified Enhanced Super-Resolution Generative Adversarial Network
(ESRGAN) model are then used to improve the quality of the two types of
patches, respectively. A modified Faster Region-based Convolutional Neural
Network (Faster R-CNN) model is used to recognize the quality-enhanced
graphical symbols. Additionally, a multi-stage task-driven collaborative
learning strategy is proposed to train the modified ESRGAN and Faster R-CNN
models to improve the resolution of engineering drawings in the direction that
facilitates graphical symbol recognition, rather than human visual perception.
A synthetic data generation method is also proposed to construct
quality-degraded samples for training the framework. Experiments on real-world
electrical diagrams show that the proposed framework achieves an accuracy of
98.98% and a recall of 99.33%, demonstrating its superiority over previous
approaches. Moreover, the framework is integrated into a widely-used power
system software application to showcase its practicality.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - $R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement [5.810659946867557]
Mesh reconstruction based on Neural Radiance Fields (NeRF) is popular in a variety of applications such as computer graphics, virtual reality, and medical imaging.
We propose a novel algorithm that progressively generates and optimize meshes from multi-view images.
Our method delivers highly competitive and robust performance in both mesh rendering quality and geometric quality.
arXiv Detail & Related papers (2024-08-19T16:33:17Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - HAT-GAE: Self-Supervised Graph Auto-encoders with Hierarchical Adaptive
Masking and Trainable Corruption [0.76146285961466]
We propose a novel auto-encoder model for graph representation learning.
Our model incorporates a hierarchical adaptive masking mechanism to incrementally increase the difficulty of training.
We demonstrate the superiority of our proposed method over state-of-the-art graph representation learning models.
arXiv Detail & Related papers (2023-01-28T02:43:54Z) - Cyclegan Network for Sheet Metal Welding Drawing Translation [0.0]
This paper proposes an automatic translation method for welded structural engineering drawings based on Cyclic Generative Adversarial Networks (CycleGAN)
The CycleGAN network model of unpaired transfer learning is used to learn the feature mapping of real welding engineering drawings.
After training with our model, the PSNR, SSIM and MSE of welding engineering drawings reach about 44.89%, 99.58% and 2.11, respectively.
arXiv Detail & Related papers (2022-09-28T13:55:36Z) - A Proper Orthogonal Decomposition approach for parameters reduction of
Single Shot Detector networks [0.0]
We propose a dimensionality reduction framework based on Proper Orthogonal Decomposition, a classical model order reduction technique.
We have applied such framework to SSD300 architecture using PASCAL VOC dataset, demonstrating a reduction of the network dimension and a remarkable speedup in the fine-tuning of the network in a transfer learning context.
arXiv Detail & Related papers (2022-07-27T14:43:14Z) - Cognitive Visual Inspection Service for LCD Manufacturing Industry [80.63336968475889]
This paper discloses a novel visual inspection system for liquid crystal display (LCD), which is currently a dominant type in the FPD industry.
System is based on two cornerstones: robust/high-performance defect recognition model and cognitive visual inspection service architecture.
arXiv Detail & Related papers (2021-01-11T08:14:35Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.