High-Resolution Vision Transformers for Pixel-Level Identification of
Structural Components and Damage
- URL: http://arxiv.org/abs/2308.03006v1
- Date: Sun, 6 Aug 2023 03:34:25 GMT
- Title: High-Resolution Vision Transformers for Pixel-Level Identification of
Structural Components and Damage
- Authors: Kareem Eltouny, Seyedomid Sajedi, and Xiao Liang
- Abstract summary: We develop a semantic segmentation network based on vision transformers and Laplacian pyramids scaling networks.
The proposed framework has been evaluated through comprehensive experiments on a dataset of bridge inspection report images.
- Score: 1.8923948104852863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual inspection is predominantly used to evaluate the state of civil
structures, but recent developments in unmanned aerial vehicles (UAVs) and
artificial intelligence have increased the speed, safety, and reliability of
the inspection process. In this study, we develop a semantic segmentation
network based on vision transformers and Laplacian pyramids scaling networks
for efficiently parsing high-resolution visual inspection images. The massive
amounts of collected high-resolution images during inspections can slow down
the investigation efforts. And while there have been extensive studies
dedicated to the use of deep learning models for damage segmentation,
processing high-resolution visual data can pose major computational
difficulties. Traditionally, images are either uniformly downsampled or
partitioned to cope with computational demands. However, the input is at risk
of losing local fine details, such as thin cracks, or global contextual
information. Inspired by super-resolution architectures, our vision transformer
model learns to resize high-resolution images and masks to retain both the
valuable local features and the global semantics without sacrificing
computational efficiency. The proposed framework has been evaluated through
comprehensive experiments on a dataset of bridge inspection report images using
multiple metrics for pixel-wise materials detection.
Related papers
- PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives.
To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD.
All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - High-Fidelity Visual Structural Inspections through Transformers and
Learnable Resizers [2.126862120884775]
Recent advances in unmanned aerial vehicles (UAVs) and artificial intelligence have made the visual inspections faster, safer, and more reliable.
High-resolution segmentation is extremely challenging due to the high computational memory demands.
We propose a hybrid strategy that can adapt to different inspections tasks by managing the global and local semantics trade-off.
arXiv Detail & Related papers (2022-10-21T18:08:26Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Exploring Resolution and Degradation Clues as Self-supervised Signal for
Low Quality Object Detection [77.3530907443279]
We propose a novel self-supervised framework to detect objects in degraded low resolution images.
Our methods has achieved superior performance compared with existing methods when facing variant degradation situations.
arXiv Detail & Related papers (2022-08-05T09:36:13Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Leveraging Image Complexity in Macro-Level Neural Network Design for
Medical Image Segmentation [3.974175960216864]
We show that image complexity can be used as a guideline in choosing what is best for a given dataset.
For high-complexity datasets, a shallow network running on the original images may yield better segmentation results than a deep network running on downsampled images.
arXiv Detail & Related papers (2021-12-21T09:49:47Z) - Sci-Net: a Scale Invariant Model for Building Detection from Aerial
Images [0.0]
We propose a Scale-invariant neural network (Sci-Net) that is able to segment buildings present in aerial images at different spatial resolutions.
Specifically, we modified the U-Net architecture and fused it with dense Atrous Spatial Pyramid Pooling (ASPP) to extract fine-grained multi-scale representations.
arXiv Detail & Related papers (2021-11-12T16:45:20Z) - Unsupervised Image Decomposition with Phase-Correlation Networks [28.502280038100167]
Phase-Correlation Decomposition Network (PCDNet) is a novel model that decomposes a scene into its object components.
In our experiments, we show how PCDNet outperforms state-of-the-art methods for unsupervised object discovery and segmentation.
arXiv Detail & Related papers (2021-10-07T13:57:33Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.