High-Fidelity Visual Structural Inspections through Transformers and
Learnable Resizers
- URL: http://arxiv.org/abs/2210.12175v1
- Date: Fri, 21 Oct 2022 18:08:26 GMT
- Title: High-Fidelity Visual Structural Inspections through Transformers and
Learnable Resizers
- Authors: Kareem Eltouny, Seyedomid Sajedi, Xiao Liang
- Abstract summary: Recent advances in unmanned aerial vehicles (UAVs) and artificial intelligence have made the visual inspections faster, safer, and more reliable.
High-resolution segmentation is extremely challenging due to the high computational memory demands.
We propose a hybrid strategy that can adapt to different inspections tasks by managing the global and local semantics trade-off.
- Score: 2.126862120884775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual inspection is the predominant technique for evaluating the condition
of civil infrastructure. The recent advances in unmanned aerial vehicles (UAVs)
and artificial intelligence have made the visual inspections faster, safer, and
more reliable. Camera-equipped UAVs are becoming the new standard in the
industry by collecting massive amounts of visual data for human inspectors.
Meanwhile, there has been significant research on autonomous visual inspections
using deep learning algorithms, including semantic segmentation. While UAVs can
capture high-resolution images of buildings' fa\c{c}ades, high-resolution
segmentation is extremely challenging due to the high computational memory
demands. Typically, images are uniformly downsized at the price of losing fine
local details. Contrarily, breaking the images into multiple smaller patches
can cause a loss of global contextual in-formation. We propose a hybrid
strategy that can adapt to different inspections tasks by managing the global
and local semantics trade-off. The framework comprises a compound,
high-resolution deep learning architecture equipped with an attention-based
segmentation model and learnable downsampler-upsampler modules designed for
optimal efficiency and in-formation retention. The framework also utilizes
vision transformers on a grid of image crops aiming for high precision learning
without downsizing. An augmented inference technique is used to boost the
performance and re-duce the possible loss of context due to grid cropping.
Comprehensive experiments have been performed on 3D physics-based graphics
models synthetic environments in the Quake City dataset. The proposed framework
is evaluated using several metrics on three segmentation tasks: component type,
component damage state, and global damage (crack, rebar, spalling).
Related papers
- Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting [32.66151412557986]
We present a weak-to-strong eliciting framework aimed at enhancing surround refinement while maintaining robust monocular perception.
Our framework employs weakly tuned experts trained on distinct subsets, and each is inherently biased toward specific camera configurations and scenarios.
For MC3D-Det joint training, the elaborate dataset merge strategy is designed to solve the problem of inconsistent camera numbers and camera parameters.
arXiv Detail & Related papers (2024-04-10T03:11:10Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - High-Resolution Vision Transformers for Pixel-Level Identification of
Structural Components and Damage [1.8923948104852863]
We develop a semantic segmentation network based on vision transformers and Laplacian pyramids scaling networks.
The proposed framework has been evaluated through comprehensive experiments on a dataset of bridge inspection report images.
arXiv Detail & Related papers (2023-08-06T03:34:25Z) - CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle
Components [77.33782775860028]
We introduce CarPatch, a novel synthetic benchmark of vehicles.
In addition to a set of images annotated with their intrinsic and extrinsic camera parameters, the corresponding depth maps and semantic segmentation masks have been generated for each view.
Global and part-based metrics have been defined and used to evaluate, compare, and better characterize some state-of-the-art techniques.
arXiv Detail & Related papers (2023-07-24T11:59:07Z) - Exploring Resolution and Degradation Clues as Self-supervised Signal for
Low Quality Object Detection [77.3530907443279]
We propose a novel self-supervised framework to detect objects in degraded low resolution images.
Our methods has achieved superior performance compared with existing methods when facing variant degradation situations.
arXiv Detail & Related papers (2022-08-05T09:36:13Z) - Vision Transformers: From Semantic Segmentation to Dense Prediction [139.15562023284187]
We explore the global context learning potentials of vision transformers (ViTs) for dense visual prediction.
Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information.
We formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture.
arXiv Detail & Related papers (2022-07-19T15:49:35Z) - A hierarchical semantic segmentation framework for computer vision-based
bridge damage detection [3.7642333932730634]
Computer vision-based damage detection using remote cameras and unmanned aerial vehicles (UAVs) enables efficient and low-cost bridge health monitoring.
This paper introduces a semantic segmentation framework that imposes the hierarchical semantic relationship between component category and damage types.
In this way, the damage detection model could focus on learning features from possible damaged regions only and avoid the effects of other irrelevant regions.
arXiv Detail & Related papers (2022-07-18T18:42:54Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Simple Open-Vocabulary Object Detection with Vision Transformers [51.57562920090721]
We propose a strong recipe for transferring image-text models to open-vocabulary object detection.
We use a standard Vision Transformer architecture with minimal modifications, contrastive image-text pre-training, and end-to-end detection fine-tuning.
We provide the adaptation strategies and regularizations needed to attain very strong performance on zero-shot text-conditioned and one-shot image-conditioned object detection.
arXiv Detail & Related papers (2022-05-12T17:20:36Z) - Sci-Net: a Scale Invariant Model for Building Detection from Aerial
Images [0.0]
We propose a Scale-invariant neural network (Sci-Net) that is able to segment buildings present in aerial images at different spatial resolutions.
Specifically, we modified the U-Net architecture and fused it with dense Atrous Spatial Pyramid Pooling (ASPP) to extract fine-grained multi-scale representations.
arXiv Detail & Related papers (2021-11-12T16:45:20Z) - Fast and Robust Structural Damage Analysis of Civil Infrastructure Using
UAV Imagery [0.0]
We propose an end-to-end method for automated structural inspection damage analysis.
Using automated object detection and segmentation we accurately localize defects, bridge utilities and elements.
Our technique not only enables fast and robust damage analysis of UAV imagery, as we show herein, but is also effective for analyzing manually acquired images.
arXiv Detail & Related papers (2021-10-10T14:24:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.