Laplacian-Former: Overcoming the Limitations of Vision Transformers in
Local Texture Detection
- URL: http://arxiv.org/abs/2309.00108v1
- Date: Thu, 31 Aug 2023 19:56:14 GMT
- Title: Laplacian-Former: Overcoming the Limitations of Vision Transformers in
Local Texture Detection
- Authors: Reza Azad, Amirhossein Kazerouni, Babak Azad, Ehsan Khodapanah Aghdam,
Yury Velichko, Ulas Bagci, Dorit Merhof
- Abstract summary: Vision Transformer (ViT) models have demonstrated a breakthrough in a wide range of computer vision tasks.
These models struggle to capture high-frequency components of images, which can limit their ability to detect local textures and edge information.
We propose a new technique, Laplacian-Former, that enhances the self-attention map by adaptively re-calibrating the frequency information in a Laplacian pyramid.
- Score: 3.784298636620067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Transformer (ViT) models have demonstrated a breakthrough in a wide
range of computer vision tasks. However, compared to the Convolutional Neural
Network (CNN) models, it has been observed that the ViT models struggle to
capture high-frequency components of images, which can limit their ability to
detect local textures and edge information. As abnormalities in human tissue,
such as tumors and lesions, may greatly vary in structure, texture, and shape,
high-frequency information such as texture is crucial for effective semantic
segmentation tasks. To address this limitation in ViT models, we propose a new
technique, Laplacian-Former, that enhances the self-attention map by adaptively
re-calibrating the frequency information in a Laplacian pyramid. More
specifically, our proposed method utilizes a dual attention mechanism via
efficient attention and frequency attention while the efficient attention
mechanism reduces the complexity of self-attention to linear while producing
the same output, selectively intensifying the contribution of shape and texture
features. Furthermore, we introduce a novel efficient enhancement multi-scale
bridge that effectively transfers spatial information from the encoder to the
decoder while preserving the fundamental features. We demonstrate the efficacy
of Laplacian-former on multi-organ and skin lesion segmentation tasks with
+1.87\% and +0.76\% dice scores compared to SOTA approaches, respectively. Our
implementation is publically available at
https://github.com/mindflow-institue/Laplacian-Former
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.