Related papers: Grid Partitioned Attention: Efficient TransformerApproximation with Inductive Bias for High Resolution Detail Generation

Grid Partitioned Attention: Efficient TransformerApproximation with Inductive Bias for High Resolution Detail Generation

URL: http://arxiv.org/abs/2107.03742v1
Date: Thu, 8 Jul 2021 10:37:23 GMT
Title: Grid Partitioned Attention: Efficient TransformerApproximation with Inductive Bias for High Resolution Detail Generation
Authors: Nikolay Jetchev, G\"okhan Yildirim, Christian Bracher, Roland Vollgraf
Abstract summary: We present Grid Partitioned Attention (GPA), a new approximate attention algorithm. Our paper introduces the new attention layer, analyzes its complexity and how the trade-off between memory usage and model power can be tuned. Our contributions are (i) algorithm and code1of the novel GPA layer, (ii) a novel deep attention-copying architecture, and (iii) new state-of-the art experimental results in human pose morphing generation benchmarks.
Score: 3.4373727078460665
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Attention is a general reasoning mechanism than can flexibly deal with image information, but its memory requirements had made it so far impractical for high resolution image generation. We present Grid Partitioned Attention (GPA), a new approximate attention algorithm that leverages a sparse inductive bias for higher computational and memory efficiency in image domains: queries attend only to few keys, spatially close queries attend to close keys due to correlations. Our paper introduces the new attention layer, analyzes its complexity and how the trade-off between memory usage and model power can be tuned by the hyper-parameters.We will show how such attention enables novel deep learning architectures with copying modules that are especially useful for conditional image generation tasks like pose morphing. Our contributions are (i) algorithm and code1of the novel GPA layer, (ii) a novel deep attention-copying architecture, and (iii) new state-of-the art experimental results in human pose morphing generation benchmarks.

Related papers

Feature Engineering is Not Dead: Reviving Classical Machine Learning with Entropy, HOG, and LBP Feature Fusion for Image Classification [0.13194391758295113]
We revisit classical machine learning based image classification through a novel approach centered on Permutation Entropy (PE)<n>We extend PE to two-dimensional images and propose a multiscale, multi-orientation entropy-based feature extraction approach.<n>Our results demonstrate that the fusion of PE with HOG and LBP provides a compact, interpretable, and effective alternative to computationally expensive and limited interpretable deep learning models.
arXiv Detail & Related papers (2025-07-18T09:29:03Z)
Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective [47.87649021414188]
We present LASADGen, an autoregressive image generator that enables selective attention to relevant spatial contexts with linear complexity.<n>Experiments on ImageNet show LASADGen achieves state-of-the-art image generation performance and computational efficiency.
arXiv Detail & Related papers (2025-07-02T12:27:06Z)
Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression [0.0]
Transformer-based encoder-decoder models have achieved remarkable success in image-to-image transfer tasks. However, their high computational complexity-manifested in elevated FLOPs and parameter counts-limits their application in real-world scenarios. We propose a Soft Knowledge Distillation (SKD) strategy that incorporates a Multi-dimensional Cross-net Attention (MCA) mechanism for compressing image restoration models.
arXiv Detail & Related papers (2025-01-16T06:25:56Z)
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference [1.0919012968294923]
We introduce a novel algorithm-architecture co-design approach that accelerates transformers using head sparsity, block sparsity and approximation opportunities to reduce computations in attention and reduce memory access. With the observation of the huge redundancy in attention scores and attention heads, we propose a novel integer-based row-balanced block pruning to prune unimportant blocks in the attention matrix at run time. Also propose integer-based head pruning to detect and prune unimportant heads at an early stage at run time.
arXiv Detail & Related papers (2024-07-17T11:15:16Z)
Image-GS: Content-Adaptive Image Representation via 2D Gaussians [55.15950594752051]
We propose Image-GS, a content-adaptive image representation. Using anisotropic 2D Gaussians as the basis, Image-GS shows high memory efficiency, supports fast random access, and offers a natural level of detail stack. General efficiency and fidelity of Image-GS are validated against several recent neural image representations and industry-standard texture compressors. We hope this research offers insights for developing new applications that require adaptive quality and resource control, such as machine perception, asset streaming, and content generation.
arXiv Detail & Related papers (2024-07-02T00:45:21Z)
Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [51.89707241449435]
In this paper, we address the challenge of integrating multi-head self-attention into high-resolution representation CNNs efficiently. We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features. We present a series of models via the Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searches for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z)
Memory-Constrained Semantic Segmentation for Ultra-High Resolution UAV Imagery [35.96063342025938]
This paper explores the intricate problem of achieving efficient and effective segmentation of ultra-high resolution UAV imagery. We propose a GPU memory-efficient and effective framework for local inference without accessing the context beyond local patches. We present an efficient memory-based interaction scheme to correct potential semantic bias of the underlying high-resolution information.
arXiv Detail & Related papers (2023-10-07T07:44:59Z)
Learning Image Deraining Transformer Network with Dynamic Dual Self-Attention [46.11162082219387]
This paper proposes an effective image deraining Transformer with dynamic dual self-attention (DDSA) Specifically, we only select the most useful similarity values based on top-k approximate calculation to achieve sparse attention. In addition, we also develop a novel spatial-enhanced feed-forward network (SEFN) to further obtain a more accurate representation for achieving high-quality derained results.
arXiv Detail & Related papers (2023-08-15T13:59:47Z)
Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels. They are not widely adopted by general users due to their substantial storage requirements. We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z)
ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions [28.956280590967808]
Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method.
arXiv Detail & Related papers (2022-05-24T17:39:53Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions. Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z)
Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.