RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations
- URL: http://arxiv.org/abs/2412.19628v1
- Date: Fri, 27 Dec 2024 13:13:52 GMT
- Title: RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations
- Authors: Mingshu Zhao, Yi Luo, Yong Ouyang,
- Abstract summary: RecConv is a decomposition strategy that efficiently constructs multi-frequency representations using small- Kernel convolutions.
RecNeXt-M3 outperforms RepViT-M1.1 by 1.9 $APbox$ on COCO with similar FLOPs.
- Score: 8.346566205092433
- License:
- Abstract: Recent advances in vision transformers (ViTs) have demonstrated the advantage of global modeling capabilities, prompting widespread integration of large-kernel convolutions for enlarging the effective receptive field (ERF). However, the quadratic scaling of parameter count and computational complexity (FLOPs) with respect to kernel size poses significant efficiency and optimization challenges. This paper introduces RecConv, a recursive decomposition strategy that efficiently constructs multi-frequency representations using small-kernel convolutions. RecConv establishes a linear relationship between parameter growth and decomposing levels which determines the effective kernel size $k\times 2^\ell$ for a base kernel $k$ and $\ell$ levels of decomposition, while maintaining constant FLOPs regardless of the ERF expansion. Specifically, RecConv achieves a parameter expansion of only $\ell+2$ times and a maximum FLOPs increase of $5/3$ times, compared to the exponential growth ($4^\ell$) of standard and depthwise convolutions. RecNeXt-M3 outperforms RepViT-M1.1 by 1.9 $AP^{box}$ on COCO with similar FLOPs. This innovation provides a promising avenue towards designing efficient and compact networks across various modalities. Codes and models can be found at \url{https://github.com/suous/RecNeXt}.
Related papers
- Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling [13.627888191693712]
We present a novel approach to parameterizing global convolutional kernels for long-sequence modelling.
Our experiments demonstrate state-of-the-art performance on the Long Range Arena, Sequential CIFAR, and Speech Commands tasks.
We also report improved performance on ImageNet classification by replacing 2D convolutions with 1D $textttMRConv$ layers.
arXiv Detail & Related papers (2024-08-18T12:20:03Z) - State-Free Inference of State-Space Models: The Transfer Function Approach [132.83348321603205]
State-free inference does not incur any significant memory or computational cost with an increase in state size.
We achieve this using properties of the proposed frequency domain transfer function parametrization.
We report improved perplexity in language modeling over a long convolutional Hyena baseline.
arXiv Detail & Related papers (2024-05-10T00:06:02Z) - FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
Cores [18.016204763652553]
Convolution models with long filters have demonstrated state-of-the-art reasoning abilities in many long-sequence tasks.
Fast Fourier Transform (FFT) allows long convolutions to run in $O(N logN)$ time in sequence length $N$ but has poor hardware utilization.
In this paper, we study how to optimize the FFT convolution.
arXiv Detail & Related papers (2023-11-10T07:33:35Z) - Fully $1\times1$ Convolutional Network for Lightweight Image
Super-Resolution [79.04007257606862]
Deep models have significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3times3$ or more)
$1times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations.
We propose a simple yet effective fully $1times1$ convolutional network, named Shift-Conv-based Network (SCNet)
arXiv Detail & Related papers (2023-07-30T06:24:03Z) - Dynamic PlenOctree for Adaptive Sampling Refinement in Explicit NeRF [6.135925201075925]
We propose the dynamic PlenOctree DOT, which adaptively refines the sample distribution to adjust to changing scene complexity.
Compared with POT, our DOT outperforms it by enhancing visual quality, reducing over $55.15$/$68.84%$ parameters, and providing 1.7/1.9 times FPS for NeRF-synthetic and Tanks $&$ Temples, respectively.
arXiv Detail & Related papers (2023-07-28T06:21:42Z) - FInC Flow: Fast and Invertible $k \times k$ Convolutions for Normalizing
Flows [2.156373334386171]
Invertible convolutions have been an essential element for building expressive normalizing flow-based generative models.
We propose a $k times k$ convolutional layer and Deep Normalizing Flow architecture.
arXiv Detail & Related papers (2023-01-23T04:31:03Z) - Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage.
We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel.
We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z) - XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv)
It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels.
XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.