LSNet: See Large, Focus Small
- URL: http://arxiv.org/abs/2503.23135v1
- Date: Sat, 29 Mar 2025 16:00:54 GMT
- Title: LSNet: See Large, Focus Small
- Authors: Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding,
- Abstract summary: We introduce LS (textbfLarge-textbfSmall) convolution, which combines large- kernel perception and small- kernel aggregation.<n>LSNet achieves superior performance and efficiency over existing lightweight networks in various vision tasks.
- Score: 67.05569159984691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision network designs, including Convolutional Neural Networks and Vision Transformers, have significantly advanced the field of computer vision. Yet, their complex computations pose challenges for practical deployments, particularly in real-time applications. To tackle this issue, researchers have explored various lightweight and efficient network designs. However, existing lightweight models predominantly leverage self-attention mechanisms and convolutions for token mixing. This dependence brings limitations in effectiveness and efficiency in the perception and aggregation processes of lightweight networks, hindering the balance between performance and efficiency under limited computational budgets. In this paper, we draw inspiration from the dynamic heteroscale vision ability inherent in the efficient human vision system and propose a ``See Large, Focus Small'' strategy for lightweight vision network design. We introduce LS (\textbf{L}arge-\textbf{S}mall) convolution, which combines large-kernel perception and small-kernel aggregation. It can efficiently capture a wide range of perceptual information and achieve precise feature aggregation for dynamic and complex visual representations, thus enabling proficient processing of visual information. Based on LS convolution, we present LSNet, a new family of lightweight models. Extensive experiments demonstrate that LSNet achieves superior performance and efficiency over existing lightweight networks in various vision tasks. Codes and models are available at https://github.com/jameslahm/lsnet.
Related papers
- LWGANet: A Lightweight Group Attention Backbone for Remote Sensing Visual Tasks [20.924609707499915]
This article introduces LWGANet, a specialized lightweight backbone network tailored for RS visual tasks.
LWGA module, tailored for RS imagery, adeptly harnesses redundant features to extract a wide range of spatial information.
The results confirm LWGANet's widespread applicability and its ability to maintain an optimal balance between high performance and low complexity.
arXiv Detail & Related papers (2025-01-17T08:56:17Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - big.LITTLE Vision Transformer for Efficient Visual Recognition [34.015778625984055]
big.LITTLE Vision Transformer is an innovative architecture aimed at achieving efficient visual recognition.
System is composed of two distinct blocks: the big performance block and the LITTLE efficiency block.
When processing an image, our system determines the importance of each token and allocates them accordingly.
arXiv Detail & Related papers (2024-10-14T08:21:00Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - NiNformer: A Network in Network Transformer with Token Mixing as a Gating Function Generator [1.3812010983144802]
The attention mechanism was utilized in computer vision as the Vision Transformer ViT.
It comes with the drawback of being expensive and requiring datasets of considerable size for effective optimization.
This paper introduces a new computational block as an alternative to the standard ViT block that reduces the compute burdens.
arXiv Detail & Related papers (2024-03-04T19:08:20Z) - Interpreting and Improving Attention From the Perspective of Large Kernel Convolution [51.06461246235176]
We introduce Large Kernel Convolutional Attention (LKCA), a novel formulation that reinterprets attention operations as a single large- Kernel convolution.
LKCA achieves competitive performance across various visual tasks, particularly in data-constrained settings.
arXiv Detail & Related papers (2024-01-11T08:40:35Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data.
We propose a dynamic token sparsification framework to prune redundant tokens.
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z) - A Lightweight, Efficient and Explainable-by-Design Convolutional Neural
Network for Internet Traffic Classification [9.365794791156972]
This paper introduces a new Lightweight, Efficient and eXplainable-by-design convolutional neural network (LEXNet) for Internet traffic classification.
LEXNet relies on a new residual block (for lightweight and efficiency purposes) and prototype layer (for explainability)
Based on a commercial-grade dataset, our evaluation shows that LEXNet succeeds to maintain the same accuracy as the best performing state-of-the-art neural network.
arXiv Detail & Related papers (2022-02-11T10:21:34Z) - Lightweight Image Super-Resolution with Multi-scale Feature Interaction
Network [15.846394239848959]
We present a lightweight multi-scale feature interaction network (MSFIN)
For lightweight SISR, MSFIN expands the receptive field and adequately exploits the informative features of the low-resolution observed images.
Our proposed MSFIN can achieve comparable performance against the state-of-the-arts with a more lightweight model.
arXiv Detail & Related papers (2021-03-24T07:25:21Z) - AttendNets: Tiny Deep Image Recognition Neural Networks for the Edge via
Visual Attention Condensers [81.17461895644003]
We introduce AttendNets, low-precision, highly compact deep neural networks tailored for on-device image recognition.
AttendNets possess deep self-attention architectures based on visual attention condensers.
Results show AttendNets have significantly lower architectural and computational complexity when compared to several deep neural networks.
arXiv Detail & Related papers (2020-09-30T01:53:17Z) - DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural
Networks [4.358626952482686]
We present a novel non-uniform quantizer that can be seamlessly mapped onto efficient ternary-based dot product engines.
The proposed quantizer (DBQ) successfully tackles the daunting task of aggressively quantizing lightweight networks such as MobileNetV1, MobileNetV2, and ShuffleNetV2.
DBQ achieves state-of-the art results with minimal training overhead and provides the best (pareto-optimal) accuracy-complexity trade-off.
arXiv Detail & Related papers (2020-07-19T23:50:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.