Long-Tailed Recognition via Weight Balancing
- URL: http://arxiv.org/abs/2203.14197v1
- Date: Sun, 27 Mar 2022 03:26:31 GMT
- Title: Long-Tailed Recognition via Weight Balancing
- Authors: Shaden Alshammari, Yu-Xiong Wang, Deva Ramanan, Shu Kong
- Abstract summary: Naive training produces models that are biased toward common classes in terms of higher accuracy.
We investigate three techniques to balance weights, L2-normalization, weight decay, and MaxNorm.
Our approach achieves the state-of-the-art accuracy on five standard benchmarks.
- Score: 66.03068252811993
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the real open world, data tends to follow long-tailed class distributions,
motivating the well-studied long-tailed recognition (LTR) problem. Naive
training produces models that are biased toward common classes in terms of
higher accuracy. The key to addressing LTR is to balance various aspects
including data distribution, training losses, and gradients in learning. We
explore an orthogonal direction, weight balancing, motivated by the empirical
observation that the naively trained classifier has "artificially" larger
weights in norm for common classes (because there exists abundant data to train
them, unlike the rare classes). We investigate three techniques to balance
weights, L2-normalization, weight decay, and MaxNorm. We first point out that
L2-normalization "perfectly" balances per-class weights to be unit norm, but
such a hard constraint might prevent classes from learning better classifiers.
In contrast, weight decay penalizes larger weights more heavily and so learns
small balanced weights; the MaxNorm constraint encourages growing small weights
within a norm ball but caps all the weights by the radius. Our extensive study
shows that both help learn balanced weights and greatly improve the LTR
accuracy. Surprisingly, weight decay, although underexplored in LTR,
significantly improves over prior work. Therefore, we adopt a two-stage
training paradigm and propose a simple approach to LTR: (1) learning features
using the cross-entropy loss by tuning weight decay, and (2) learning
classifiers using class-balanced loss by tuning weight decay and MaxNorm. Our
approach achieves the state-of-the-art accuracy on five standard benchmarks,
serving as a future baseline for long-tailed recognition.
Related papers
- CLASSP: a Biologically-Inspired Approach to Continual Learning through Adjustment Suppression and Sparsity Promotion [0.0]
This paper introduces a new training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP)
CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation.
When compared with Elastic Weight Consolidation (EWC) datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.
arXiv Detail & Related papers (2024-04-29T13:31:00Z) - Why Do We Need Weight Decay in Modern Deep Learning? [24.81634291051533]
Weight decay is a technique for training state-of-the-art deep networks from image classification to large language models.
In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory.
For deep networks on vision tasks trained with multipass SGD, we show how weight decay modifies the optimization dynamics enhancing the implicit regularization of SGD.
arXiv Detail & Related papers (2023-10-06T17:58:21Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - Exploring Weight Balancing on Long-Tailed Recognition Problem [32.01426831450348]
Recognition problems in long-tailed data, in which the sample size per class is heavily skewed, have gained importance.
Weight balancing, which combines classical regularization techniques with two-stage training, has been proposed.
We analyze weight balancing by focusing on neural collapse and the cone effect at each training stage.
arXiv Detail & Related papers (2023-05-26T01:45:19Z) - Adaptive Distribution Calibration for Few-Shot Learning with
Hierarchical Optimal Transport [78.9167477093745]
We propose a novel distribution calibration method by learning the adaptive weight matrix between novel samples and base classes.
Experimental results on standard benchmarks demonstrate that our proposed plug-and-play model outperforms competing approaches.
arXiv Detail & Related papers (2022-10-09T02:32:57Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for
Imbalanced Learning [97.81549071978789]
We propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients.
We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-04-19T08:23:23Z) - Distributional Robustness Loss for Long-tail Learning [20.800627115140465]
Real-world data is often unbalanced and long-tailed, but deep models struggle to recognize rare classes in the presence of frequent classes.
We show that the feature extractor part of deep networks suffers greatly from this bias.
We propose a new loss based on robustness theory, which encourages the model to learn high-quality representations for both head and tail classes.
arXiv Detail & Related papers (2021-04-07T11:34:04Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.