Which Samples Should be Learned First: Easy or Hard?
- URL: http://arxiv.org/abs/2110.05481v2
- Date: Thu, 14 Oct 2021 14:58:56 GMT
- Title: Which Samples Should be Learned First: Easy or Hard?
- Authors: Xiaoling Zhou and Ou Wu
- Abstract summary: weighting scheme for training samples is essential for learning tasks.
Some schemes take the easy-first mode on samples, whereas some others take the hard-first mode.
Factors including prior knowledge and data characteristics determine which samples should be learned first in a learning task.
- Score: 5.589137389571604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An effective weighting scheme for training samples is essential for learning
tasks. Numerous weighting schemes have been proposed. Some schemes take the
easy-first mode on samples, whereas some others take the hard-first mode.
Naturally, an interesting yet realistic question is raised. Which samples
should be learned first given a new learning task, easy or hard? To answer this
question, three aspects of research are carried out. First, a high-level
unified weighted loss is proposed, providing a more comprehensive view for
existing schemes. Theoretical analysis is subsequently conducted and
preliminary conclusions are obtained. Second, a flexible weighting scheme is
proposed to overcome the defects of existing schemes. The three modes, namely,
easy/medium/hard-first, can be flexibly switched in the proposed scheme. Third,
a wide range of experiments are conducted to further compare the weighting
schemes in different modes. On the basis of these works, reasonable answers are
obtained. Factors including prior knowledge and data characteristics determine
which samples should be learned first in a learning task.
Related papers
- Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs
"Difficult" Downstream Tasks in LLMs [71.56345106591789]
It has been believed that weights in large language models (LLMs) contain significant redundancy.
This paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks.
arXiv Detail & Related papers (2023-09-29T22:55:06Z) - Understanding Difficulty-based Sample Weighting with a Universal
Difficulty Measure [2.7413469516930578]
A large number of weighting methods essentially utilize the learning difficulty of training samples to calculate their weights.
The learning difficulties of the samples are determined by multiple factors including noise level, imbalance degree, margin, and uncertainty.
In this study, we theoretically prove that the generalization error of a sample can be used as a universal difficulty measure.
arXiv Detail & Related papers (2023-01-12T07:28:32Z) - Momentum Contrastive Pre-training for Question Answering [54.57078061878619]
MCROSS introduces a momentum contrastive learning framework to align the answer probability between cloze-like and natural query-passage sample pairs.
Our method achieves noticeable improvement compared with all baselines in both supervised and zero-shot scenarios.
arXiv Detail & Related papers (2022-12-12T08:28:22Z) - DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples
Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance.
To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z) - Exploring the Learning Difficulty of Data Theory and Measure [2.668651175000491]
This study attempts to conduct a pilot theoretical study for learning difficulty of samples.
A theoretical definition of learning difficulty is proposed on the basis of the bias-variance trade-off theory on generalization error.
Several classical weighting methods in machine learning can be well explained on account of explored properties.
arXiv Detail & Related papers (2022-05-16T02:28:12Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Generalized and Incremental Few-Shot Learning by Explicit Learning and
Calibration without Forgetting [86.56447683502951]
We propose a three-stage framework that allows to explicitly and effectively address these challenges.
We evaluate the proposed framework on four challenging benchmark datasets for image and video few-shot classification.
arXiv Detail & Related papers (2021-08-18T14:21:43Z) - Uniform Sampling over Episode Difficulty [55.067544082168624]
We propose a method to approximate episode sampling distributions based on their difficulty.
As the proposed sampling method is algorithm agnostic, we can leverage these insights to improve few-shot learning accuracies.
arXiv Detail & Related papers (2021-08-03T17:58:54Z) - A Mathematical Foundation for Robust Machine Learning based on
Bias-Variance Trade-off [3.3161271977874964]
Some samples are difficult to learn and some samples are noisy. The unequal contributions of samples has a considerable effect on training performances.
Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ.
This study attempts to construct a mathematical foundation for robust machine learning (RML) based on the bias-variance trade-off theory.
arXiv Detail & Related papers (2021-06-10T06:21:55Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.