Scale Equalization for Multi-Level Feature Fusion
- URL: http://arxiv.org/abs/2402.01149v1
- Date: Fri, 2 Feb 2024 05:25:51 GMT
- Title: Scale Equalization for Multi-Level Feature Fusion
- Authors: Bum Jun Kim, Sang Woo Kim
- Abstract summary: We find that multi-level features from parallel branches are on different scales.
The scale disequilibrium is a universal and unwanted flaw that leads to detrimental gradient descent.
We propose injecting scale equalizers to achieve scale equilibrium across multi-level features after bilinear upsampling.
- Score: 8.541075075344438
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have exhibited remarkable performance in a variety of
computer vision fields, especially in semantic segmentation tasks. Their
success is often attributed to multi-level feature fusion, which enables them
to understand both global and local information from an image. However, we
found that multi-level features from parallel branches are on different scales.
The scale disequilibrium is a universal and unwanted flaw that leads to
detrimental gradient descent, thereby degrading performance in semantic
segmentation. We discover that scale disequilibrium is caused by bilinear
upsampling, which is supported by both theoretical and empirical evidence.
Based on this observation, we propose injecting scale equalizers to achieve
scale equilibrium across multi-level features after bilinear upsampling. Our
proposed scale equalizers are easy to implement, applicable to any
architecture, hyperparameter-free, implementable without requiring extra
computational cost, and guarantee scale equilibrium for any dataset.
Experiments showed that adopting scale equalizers consistently improved the
mIoU index across various target datasets, including ADE20K, PASCAL VOC 2012,
and Cityscapes, as well as various decoder choices, including UPerHead,
PSPHead, ASPPHead, SepASPPHead, and FCNHead.
Related papers
- Implicit Grid Convolution for Multi-Scale Image Super-Resolution [6.8410780175245165]
We propose a multi-scale framework that employs a single encoder in conjunction with Implicit Grid Convolution (IGConv)
Our framework achieves comparable performance to existing fixed-scale methods while reducing the training budget and stored parameters three-fold.
arXiv Detail & Related papers (2024-08-19T03:30:15Z) - Multi-scale Unified Network for Image Classification [33.560003528712414]
CNNs face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs.
We propose Multi-scale Unified Network (MUSN) consisting of multi-scales, a unified network, and scale-invariant constraint.
MUSN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.
arXiv Detail & Related papers (2024-03-27T06:40:26Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU.
We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors.
We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Exploiting Invariance in Training Deep Neural Networks [4.169130102668252]
Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks.
The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks.
Tested on ImageNet, MS COCO, and Cityscapes datasets, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision tasks of image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2021-03-30T19:18:31Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale
Convolutional Layer [76.44375136492827]
Convolutional Neural Networks (CNNs) are often scale-sensitive.
We bridge this regret by exploiting multi-scale features in a finer granularity.
The proposed convolution operation, named Poly-Scale Convolution (PSConv), mixes up a spectrum of dilation rates.
arXiv Detail & Related papers (2020-07-13T05:14:11Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.