GradTail: Learning Long-Tailed Data Using Gradient-based Sample
Weighting
- URL: http://arxiv.org/abs/2201.05938v2
- Date: Wed, 19 Jan 2022 02:27:03 GMT
- Title: GradTail: Learning Long-Tailed Data Using Gradient-based Sample
Weighting
- Authors: Zhao Chen, Vincent Casser, Henrik Kretzschmar, Dragomir Anguelov
- Abstract summary: We show that an approach based on gradient dot product agreement can isolate long-tailed data early on during model training and improve performance by dynamically picking higher sample weights for that data.
We show that such upweighting leads to model improvements for both classification and regression models, the latter of which are relatively unexplored in the long-tail literature.
- Score: 15.418627530276598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose GradTail, an algorithm that uses gradients to improve model
performance on the fly in the face of long-tailed training data distributions.
Unlike conventional long-tail classifiers which operate on converged - and
possibly overfit - models, we demonstrate that an approach based on gradient
dot product agreement can isolate long-tailed data early on during model
training and improve performance by dynamically picking higher sample weights
for that data. We show that such upweighting leads to model improvements for
both classification and regression models, the latter of which are relatively
unexplored in the long-tail literature, and that the long-tail examples found
by gradient alignment are consistent with our semantic expectations.
Related papers
- Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key [3.3339400603549265]
We show that it is possible to achieve notable performance improvement in tuned models with a small fraction of training data instances and compute.
Our findings suggest that, while capacities for generating long output vary across different models out-of-the-box, our approach to tune them with high-quality data using lite compute, consistently yields notable improvement across all models we experimented on.
arXiv Detail & Related papers (2024-10-14T07:09:02Z) - Multi-view Disparity Estimation Using a Novel Gradient Consistency Model [0.0]
This paper proposes the use of Gradient Consistency information to assess the validity of the linearisation.
This information is used to determine the weights applied to the data term as part of an analytically inspired Gradient Consistency Model.
We show that the Gradient Consistency Model outperforms standard coarse-to-fine schemes.
arXiv Detail & Related papers (2024-05-27T10:30:59Z) - Orthogonal Uncertainty Representation of Data Manifold for Robust
Long-Tailed Learning [52.021899899683675]
In scenarios with long-tailed distributions, the model's ability to identify tail classes is limited due to the under-representation of tail samples.
We propose an Orthogonal Uncertainty Representation (OUR) of feature embedding and an end-to-end training strategy to improve the long-tail phenomenon of model robustness.
arXiv Detail & Related papers (2023-10-16T05:50:34Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one.
Our approach effectively involves computing a weighted average of the models' parameters.
We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Progressive Growing of Neural ODEs [7.558546277131641]
We propose a progressive learning paradigm of NODEs for long-term time series forecasting.
Specifically, following the principle of curriculum learning, we gradually increase the complexity of data and network capacity as training progresses.
Our experiments with both synthetic data and real traffic data (PeMS Bay Area traffic data) show that our training methodology consistently improves the performance of vanilla NODEs by over 64%.
arXiv Detail & Related papers (2020-03-08T01:15:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.