Defect Transformer: An Efficient Hybrid Transformer Architecture for
Surface Defect Detection
- URL: http://arxiv.org/abs/2207.08319v1
- Date: Sun, 17 Jul 2022 23:37:48 GMT
- Title: Defect Transformer: An Efficient Hybrid Transformer Architecture for
Surface Defect Detection
- Authors: Junpu Wang, Guili Xu, Fuju Yan, Jinjin Wang and Zhengsheng Wang
- Abstract summary: We propose an efficient hybrid transformer architecture, termed Defect Transformer (DefT), for surface defect detection.
DefT incorporates CNN and transformer into a unified model to capture local and non-local relationships collaboratively.
Experiments on three datasets demonstrate the superiority and efficiency of our method compared with other CNN- and transformer-based networks.
- Score: 2.0999222360659604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surface defect detection is an extremely crucial step to ensure the quality
of industrial products. Nowadays, convolutional neural networks (CNNs) based on
encoder-decoder architecture have achieved tremendous success in various defect
detection tasks. However, due to the intrinsic locality of convolution, they
commonly exhibit a limitation in explicitly modeling long-range interactions,
critical for pixel-wise defect detection in complex cases, e.g., cluttered
background and illegible pseudo-defects. Recent transformers are especially
skilled at learning global image dependencies but with limited local structural
information necessary for detailed defect location. To overcome the above
limitations, we propose an efficient hybrid transformer architecture, termed
Defect Transformer (DefT), for surface defect detection, which incorporates CNN
and transformer into a unified model to capture local and non-local
relationships collaboratively. Specifically, in the encoder module, a
convolutional stem block is firstly adopted to retain more detailed spatial
information. Then, the patch aggregation blocks are used to generate
multi-scale representation with four hierarchies, each of them is followed by a
series of DefT blocks, which respectively include a locally position-aware
block for local position encoding, a lightweight multi-pooling self-attention
to model multi-scale global contextual relationships with good computational
efficiency, and a convolutional feed-forward network for feature transformation
and further location information learning. Finally, a simple but effective
decoder module is proposed to gradually recover spatial details from the skip
connections in the encoder. Extensive experiments on three datasets demonstrate
the superiority and efficiency of our method compared with other CNN- and
transformer-based networks.
Related papers
- Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - CINFormer: Transformer network with multi-stage CNN feature injection
for surface defect segmentation [73.02218479926469]
We propose a transformer network with multi-stage CNN feature injection for surface defect segmentation.
CINFormer presents a simple yet effective feature integration mechanism that injects the multi-level CNN features of the input image into different stages of the transformer network in the encoder.
In addition, CINFormer presents a Top-K self-attention module to focus on tokens with more important information about the defects.
arXiv Detail & Related papers (2023-09-22T06:12:02Z) - Feature Shrinkage Pyramid for Camouflaged Object Detection with
Transformers [34.42710399235461]
Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection.
They suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders.
We propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features.
arXiv Detail & Related papers (2023-03-26T20:50:58Z) - CAINNFlow: Convolutional block Attention modules and Invertible Neural
Networks Flow for anomaly detection and localization tasks [28.835943674247346]
In this study, we design a complex function model with alternating CBAM embedded in a stacked $3times3$ full convolution, which is able to retain and effectively extract spatial structure information.
Experiments show that CAINNFlow achieves advanced levels of accuracy and inference efficiency based on CNN and Transformer backbone networks as feature extractors.
arXiv Detail & Related papers (2022-06-04T13:45:08Z) - Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths.
We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately.
The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Full Transformer Framework for Robust Point Cloud Registration with Deep
Information Interaction [9.431484068349903]
Recent Transformer-based methods have achieved advanced performance in point cloud registration.
Recent CNNs fail to model global relations due to their local fields receptive.
shallow-wide architecture of Transformers and lack of positional encoding lead to indistinct feature extraction.
arXiv Detail & Related papers (2021-12-17T08:40:52Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.