Related papers: Which Samples Should be Learned First: Easy or Hard?

Which Samples Should be Learned First: Easy or Hard?

URL: http://arxiv.org/abs/2110.05481v2
Date: Thu, 14 Oct 2021 14:58:56 GMT
Title: Which Samples Should be Learned First: Easy or Hard?
Authors: Xiaoling Zhou and Ou Wu
Abstract summary: weighting scheme for training samples is essential for learning tasks. Some schemes take the easy-first mode on samples, whereas some others take the hard-first mode. Factors including prior knowledge and data characteristics determine which samples should be learned first in a learning task.
Score: 5.589137389571604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An effective weighting scheme for training samples is essential for learning tasks. Numerous weighting schemes have been proposed. Some schemes take the easy-first mode on samples, whereas some others take the hard-first mode. Naturally, an interesting yet realistic question is raised. Which samples should be learned first given a new learning task, easy or hard? To answer this question, three aspects of research are carried out. First, a high-level unified weighted loss is proposed, providing a more comprehensive view for existing schemes. Theoretical analysis is subsequently conducted and preliminary conclusions are obtained. Second, a flexible weighting scheme is proposed to overcome the defects of existing schemes. The three modes, namely, easy/medium/hard-first, can be flexibly switched in the proposed scheme. Third, a wide range of experiments are conducted to further compare the weighting schemes in different modes. On the basis of these works, reasonable answers are obtained. Factors including prior knowledge and data characteristics determine which samples should be learned first in a learning task.

Related papers

Preview-based Category Contrastive Learning for Knowledge Distillation [53.551002781828146]
We propose a novel preview-based category contrastive learning method for knowledge distillation (PCKD) It first distills the structural knowledge of both instance-level feature correspondence and the relation between instance features and category centers. It can explicitly optimize the category representation and explore the distinct correlation between representations of instances and categories.
arXiv Detail & Related papers (2024-10-18T03:31:00Z)
Machine Unlearning in Forgettability Sequence [22.497699136603877]
We identify key factor affecting unlearning difficulty and the performance of unlearning algorithms. We propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.
arXiv Detail & Related papers (2024-10-09T01:12:07Z)
Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs [71.56345106591789]
It has been believed that weights in large language models (LLMs) contain significant redundancy. This paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks.
arXiv Detail & Related papers (2023-09-29T22:55:06Z)
Understanding Difficulty-based Sample Weighting with a Universal Difficulty Measure [2.7413469516930578]
A large number of weighting methods essentially utilize the learning difficulty of training samples to calculate their weights. The learning difficulties of the samples are determined by multiple factors including noise level, imbalance degree, margin, and uncertainty. In this study, we theoretically prove that the generalization error of a sample can be used as a universal difficulty measure.
arXiv Detail & Related papers (2023-01-12T07:28:32Z)
Momentum Contrastive Pre-training for Question Answering [54.57078061878619]
MCROSS introduces a momentum contrastive learning framework to align the answer probability between cloze-like and natural query-passage sample pairs. Our method achieves noticeable improvement compared with all baselines in both supervised and zero-shot scenarios.
arXiv Detail & Related papers (2022-12-12T08:28:22Z)
DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z)
Exploring the Learning Difficulty of Data Theory and Measure [2.668651175000491]
This study attempts to conduct a pilot theoretical study for learning difficulty of samples. A theoretical definition of learning difficulty is proposed on the basis of the bias-variance trade-off theory on generalization error. Several classical weighting methods in machine learning can be well explained on account of explored properties.
arXiv Detail & Related papers (2022-05-16T02:28:12Z)
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
Uniform Sampling over Episode Difficulty [55.067544082168624]
We propose a method to approximate episode sampling distributions based on their difficulty. As the proposed sampling method is algorithm agnostic, we can leverage these insights to improve few-shot learning accuracies.
arXiv Detail & Related papers (2021-08-03T17:58:54Z)
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features. This simplicity bias can explain their lack of robustness out of distribution (OOD) We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.