DBAT: Dynamic Backward Attention Transformer for Material Segmentation
with Cross-Resolution Patches
- URL: http://arxiv.org/abs/2305.03919v2
- Date: Wed, 28 Feb 2024 10:22:03 GMT
- Title: DBAT: Dynamic Backward Attention Transformer for Material Segmentation
with Cross-Resolution Patches
- Authors: Yuwen Heng, Srinandan Dasmahapatra, Hansung Kim
- Abstract summary: We propose the Dynamic Backward Attention Transformer (DBAT) to aggregate cross-resolution features.
Experiments show that our DBAT achieves an accuracy of 86.85%, which is the best performance among state-of-the-art real-time models.
We further align features to semantic labels, performing network dissection, to infer that the proposed model can extract material-related features better than other methods.
- Score: 8.812837829361923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The objective of dense material segmentation is to identify the material
categories for every image pixel. Recent studies adopt image patches to extract
material features. Although the trained networks can improve the segmentation
performance, their methods choose a fixed patch resolution which fails to take
into account the variation in pixel area covered by each material. In this
paper, we propose the Dynamic Backward Attention Transformer (DBAT) to
aggregate cross-resolution features. The DBAT takes cropped image patches as
input and gradually increases the patch resolution by merging adjacent patches
at each transformer stage, instead of fixing the patch resolution during
training. We explicitly gather the intermediate features extracted from
cross-resolution patches and merge them dynamically with predicted attention
masks. Experiments show that our DBAT achieves an accuracy of 86.85%, which is
the best performance among state-of-the-art real-time models. Like other
successful deep learning solutions with complex architectures, the DBAT also
suffers from lack of interpretability. To address this problem, this paper
examines the properties that the DBAT makes use of. By analysing the
cross-resolution features and the attention weights, this paper interprets how
the DBAT learns from image patches. We further align features to semantic
labels, performing network dissection, to infer that the proposed model can
extract material-related features better than other methods. We show that the
DBAT model is more robust to network initialisation, and yields fewer variable
predictions compared to other models. The project code is available at
https://github.com/heng-yuwen/Dynamic-Backward-Attention-Transformer.
Related papers
- Adaptive Patching for High-resolution Image Segmentation with Transformers [9.525013089622183]
Attention-based models are proliferating in the space of image analytics, including segmentation.
Standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens.
For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model, if we are to use smaller patch sizes that are favorable in segmentation.
We take inspiration from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the images, as a pre-processing step, based
arXiv Detail & Related papers (2024-04-15T12:06:00Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text
Image Super-Resolution [22.60056946339325]
We propose the Pixel Adapter Module (PAM) based on graph attention to address pixel distortion caused by upsampling.
The PAM effectively captures local structural information by allowing each pixel to interact with its neighbors and update features.
We demonstrate that our proposed method generates high-quality super-resolution images, surpassing existing methods in recognition accuracy.
arXiv Detail & Related papers (2023-09-16T08:12:12Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions.
Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z) - HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition.
We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution.
Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z) - ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness
against Adversarial Patches [20.030925907337075]
ImageNet-Patch is a dataset to benchmark machine-learning models against adversarial patches.
It consists of a set of patches, optimized to generalize across different models, and readily applicable to ImageNet data after preprocessing them.
We showcase the usefulness of this dataset by testing the effectiveness of the computed patches against 127 models.
arXiv Detail & Related papers (2022-03-07T17:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.