Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment
- URL: http://arxiv.org/abs/2305.11733v1
- Date: Fri, 19 May 2023 15:11:06 GMT
- Title: Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment
- Authors: Mengke Li, Yiu-ming Cheung, Yang Lu
- Abstract summary: We observe that vanilla training on long-tailed data with cross-entropy loss makes the instance-rich head classes severely squeeze the spatial distribution of the tail classes.
The original cross-entropy loss can only propagate gradient short-lively because the gradient in softmax form rapidly approaches zero as the logit difference increases.
It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data.
- Score: 40.95064819012895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-tailed data is still a big challenge for deep neural networks, even
though they have achieved great success on balanced data. We observe that
vanilla training on long-tailed data with cross-entropy loss makes the
instance-rich head classes severely squeeze the spatial distribution of the
tail classes, which leads to difficulty in classifying tail class samples.
Furthermore, the original cross-entropy loss can only propagate gradient
short-lively because the gradient in softmax form rapidly approaches zero as
the logit difference increases. This phenomenon is called softmax saturation.
It is unfavorable for training on balanced data, but can be utilized to adjust
the validity of the samples in long-tailed data, thereby solving the distorted
embedding space of long-tailed problems. To this end, this paper proposes the
Gaussian clouded logit adjustment by Gaussian perturbation of different class
logits with varied amplitude. We define the amplitude of perturbation as cloud
size and set relatively large cloud sizes to tail classes. The large cloud size
can reduce the softmax saturation and thereby making tail class samples more
active as well as enlarging the embedding space. To alleviate the bias in a
classifier, we therefore propose the class-based effective number sampling
strategy with classifier re-training. Extensive experiments on benchmark
datasets validate the superior performance of the proposed method. Source code
is available at https://github.com/Keke921/GCLLoss.
Related papers
- A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory [12.689249854199982]
We show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples.
We then demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory.
arXiv Detail & Related papers (2023-11-24T18:27:41Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - Adjusting Logit in Gaussian Form for Long-Tailed Visual Recognition [37.62659619941791]
We study the problem of long-tailed visual recognition from the perspective of feature level.
Two novel logit adjustment methods are proposed to improve model performance at a modest computational overhead.
Experiments conducted on benchmark datasets demonstrate the superior performance of the proposed method over the state-of-the-art ones.
arXiv Detail & Related papers (2023-05-18T02:06:06Z) - Bayes-optimal Learning of Deep Random Networks of Extensive-width [22.640648403570957]
We consider the problem of learning a target function corresponding to a deep, extensive, non-linear neural network with random Gaussian weights.
We compute closed-form expressions for the test errors of ridge regression, kernel and random features regression.
We show numerically that when the number of samples grows faster than the dimension, ridge and kernel methods become suboptimal, while neural networks achieve test error close to zero from quadratically many samples.
arXiv Detail & Related papers (2023-02-01T11:14:08Z) - Feature Generation for Long-tail Classification [36.186909933006675]
We show how to generate meaningful features by estimating the tail category's distribution.
We also present a qualitative analysis of generated features using t-SNE visualizations and analyze the nearest neighbors used to calibrate the tail class distributions.
arXiv Detail & Related papers (2021-11-10T21:34:29Z) - Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data
via Differentiable Cross-Approximation [53.95297550117153]
We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking emphat a fraction of their entries only.
The proposed approach is particularly useful for large-scale multidimensional grid data, and for tasks that require context over a large receptive field.
arXiv Detail & Related papers (2021-05-29T08:39:57Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax
Layer [77.90012156266324]
This paper aims to find a subspace of neural networks that can facilitate a large decision margin.
We propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain during both the training and test processes.
Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets.
arXiv Detail & Related papers (2020-04-20T02:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.