Multiscale Low-Frequency Memory Network for Improved Feature Extraction
in Convolutional Neural Networks
- URL: http://arxiv.org/abs/2403.08157v1
- Date: Wed, 13 Mar 2024 00:48:41 GMT
- Title: Multiscale Low-Frequency Memory Network for Improved Feature Extraction
in Convolutional Neural Networks
- Authors: Fuzhi Wu, Jiasong Wu, Youyong Kong, Chunfeng Yang, Guanyu Yang,
Huazhong Shu, Guy Carrault, Lotfi Senhadji
- Abstract summary: We introduce a novel framework, the Multiscale Low-Frequency Memory (MLFM) Network.
The MLFM efficiently preserves low-frequency information, enhancing performance in targeted computer vision tasks.
Our work builds upon the existing CNN foundations and paves the way for future advancements in computer vision.
- Score: 13.815116154370834
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep learning and Convolutional Neural Networks (CNNs) have driven major
transformations in diverse research areas. However, their limitations in
handling low-frequency information present obstacles in certain tasks like
interpreting global structures or managing smooth transition images. Despite
the promising performance of transformer structures in numerous tasks, their
intricate optimization complexities highlight the persistent need for refined
CNN enhancements using limited resources. Responding to these complexities, we
introduce a novel framework, the Multiscale Low-Frequency Memory (MLFM)
Network, with the goal to harness the full potential of CNNs while keeping
their complexity unchanged. The MLFM efficiently preserves low-frequency
information, enhancing performance in targeted computer vision tasks. Central
to our MLFM is the Low-Frequency Memory Unit (LFMU), which stores various
low-frequency data and forms a parallel channel to the core network. A key
advantage of MLFM is its seamless compatibility with various prevalent
networks, requiring no alterations to their original core structure. Testing on
ImageNet demonstrated substantial accuracy improvements in multiple 2D CNNs,
including ResNet, MobileNet, EfficientNet, and ConvNeXt. Furthermore, we
showcase MLFM's versatility beyond traditional image classification by
successfully integrating it into image-to-image translation tasks, specifically
in semantic segmentation networks like FCN and U-Net. In conclusion, our work
signifies a pivotal stride in the journey of optimizing the efficacy and
efficiency of CNNs with limited resources. This research builds upon the
existing CNN foundations and paves the way for future advancements in computer
vision. Our codes are available at https://github.com/AlphaWuSeu/ MLFM.
Related papers
- CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction [14.377544481394013]
CTA-Net combines CNNs and ViTs, with transformers capturing long-range dependencies and CNNs extracting localized features.
This integration enables efficient processing of detailed local and broader contextual information.
Experiments on small-scale datasets with fewer than 100,000 samples show that CTA-Net achieves superior performance.
arXiv Detail & Related papers (2024-10-15T09:27:26Z) - TFDMNet: A Novel Network Structure Combines the Time Domain and
Frequency Domain Features [34.91485245048524]
This paper proposes a novel Element-wise Multiplication Layer (EML) to replace convolution layers.
We also introduce a Weight Fixation mechanism to alleviate the problem of over-fitting.
Experimental results imply that TFDMNet achieves good performance on MNIST, CIFAR-10 and ImageNet databases.
arXiv Detail & Related papers (2024-01-29T08:18:21Z) - Enhancing Small Object Encoding in Deep Neural Networks: Introducing
Fast&Focused-Net with Volume-wise Dot Product Layer [0.0]
We introduce Fast&Focused-Net, a novel deep neural network architecture tailored for encoding small objects into fixed-length feature vectors.
Fast&Focused-Net employs a series of our newly proposed layer, the Volume-wise Dot Product (VDP) layer, designed to address several inherent limitations of CNNs.
For small object classification tasks, our network outperformed state-of-the-art methods on datasets such as CIFAR-10, CIFAR-100, STL-10, SVHN-Cropped, and Fashion-MNIST.
In the context of larger image classification, when combined with a transformer encoder (ViT
arXiv Detail & Related papers (2024-01-18T09:31:25Z) - Revisiting Image Deblurring with an Efficient ConvNet [24.703240497171503]
We propose a lightweight CNN network that features a large effective receptive field (ERF) and demonstrates comparable or even better performance than Transformers.
Our key design is an efficient CNN block dubbed LaKD, equipped with a large kernel depth-wise convolution and spatial-channel mixing structure.
We achieve +0.17dB / +0.43dB PSNR over the state-of-the-art Restormer on defocus / motion deblurring benchmark datasets with 32% fewer parameters and 39% fewer MACs.
arXiv Detail & Related papers (2023-02-04T20:42:46Z) - MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in
Optical Remote Sensing Images [7.764449276074902]
We propose a hybrid network based on multi-scale CNN-transformer structure, termed MCTNet.
We show that our MCTNet achieves better detection performance than existing state-of-the-art CD methods.
arXiv Detail & Related papers (2022-10-14T07:54:28Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - CTCNet: A CNN-Transformer Cooperation Network for Face Image
Super-Resolution [64.06360660979138]
We propose an efficient CNN-Transformer Cooperation Network (CTCNet) for face super-resolution tasks.
We first devise a novel Local-Global Feature Cooperation Module (LGCM), which is composed of a Facial Structure Attention Unit (FSAU) and a Transformer block.
We then design an efficient Feature Refinement Module (FRM) to enhance the encoded features.
arXiv Detail & Related papers (2022-04-19T06:38:29Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations.
We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation.
In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.