Exploring the Learning Difficulty of Data Theory and Measure
- URL: http://arxiv.org/abs/2205.07427v1
- Date: Mon, 16 May 2022 02:28:12 GMT
- Title: Exploring the Learning Difficulty of Data Theory and Measure
- Authors: Weiyao Zhu, Ou Wu, Fengguang Su, and Yingjun Deng
- Abstract summary: This study attempts to conduct a pilot theoretical study for learning difficulty of samples.
A theoretical definition of learning difficulty is proposed on the basis of the bias-variance trade-off theory on generalization error.
Several classical weighting methods in machine learning can be well explained on account of explored properties.
- Score: 2.668651175000491
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As learning difficulty is crucial for machine learning (e.g.,
difficulty-based weighting learning strategies), previous literature has
proposed a number of learning difficulty measures. However, no comprehensive
investigation for learning difficulty is available to date, resulting in that
nearly all existing measures are heuristically defined without a rigorous
theoretical foundation. In addition, there is no formal definition of easy and
hard samples even though they are crucial in many studies. This study attempts
to conduct a pilot theoretical study for learning difficulty of samples. First,
a theoretical definition of learning difficulty is proposed on the basis of the
bias-variance trade-off theory on generalization error. Theoretical definitions
of easy and hard samples are established on the basis of the proposed
definition. A practical measure of learning difficulty is given as well
inspired by the formal definition. Second, the properties for learning
difficulty-based weighting strategies are explored. Subsequently, several
classical weighting methods in machine learning can be well explained on
account of explored properties. Third, the proposed measure is evaluated to
verify its reasonability and superiority in terms of several main difficulty
factors. The comparison in these experiments indicates that the proposed
measure significantly outperforms the other measures throughout the
experiments.
Related papers
- Towards Understanding the Feasibility of Machine Unlearning [14.177012256360635]
We present a set of novel metrics for quantifying the difficulty of unlearning.
Specifically, we propose several metrics to assess the conditions necessary for a successful unlearning operation.
We also present a ranking mechanism to identify the most challenging samples to unlearn.
arXiv Detail & Related papers (2024-10-03T23:41:42Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - Understanding Difficulty-based Sample Weighting with a Universal
Difficulty Measure [2.7413469516930578]
A large number of weighting methods essentially utilize the learning difficulty of training samples to calculate their weights.
The learning difficulties of the samples are determined by multiple factors including noise level, imbalance degree, margin, and uncertainty.
In this study, we theoretically prove that the generalization error of a sample can be used as a universal difficulty measure.
arXiv Detail & Related papers (2023-01-12T07:28:32Z) - Difficulty-Net: Learning to Predict Difficulty for Long-Tailed
Recognition [5.977483447975081]
We propose Difficulty-Net, which learns to predict the difficulty of classes using the model's performance in a meta-learning framework.
We introduce two key concepts, namely the relative difficulty and the driver loss.
Experiments on popular long-tailed datasets demonstrated the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-09-07T07:04:08Z) - What Makes Good Contrastive Learning on Small-Scale Wearable-based
Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task.
This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z) - Which Samples Should be Learned First: Easy or Hard? [5.589137389571604]
weighting scheme for training samples is essential for learning tasks.
Some schemes take the easy-first mode on samples, whereas some others take the hard-first mode.
Factors including prior knowledge and data characteristics determine which samples should be learned first in a learning task.
arXiv Detail & Related papers (2021-10-11T03:40:29Z) - Demystification of Few-shot and One-shot Learning [63.58514532659252]
Few-shot and one-shot learning have been the subject of active and intensive research in recent years.
We show that if the ambient or latent decision space of a learning machine is sufficiently high-dimensional than a large class of objects in this space can indeed be easily learned from few examples.
arXiv Detail & Related papers (2021-04-25T14:47:05Z) - Constrained Learning with Non-Convex Losses [119.8736858597118]
Though learning has become a core technology of modern information processing, there is now ample evidence that it can lead to biased, unsafe, and prejudiced solutions.
arXiv Detail & Related papers (2021-03-08T23:10:33Z) - Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory
to Learning Algorithms [91.3755431537592]
We analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression.
We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice.
arXiv Detail & Related papers (2021-01-26T17:11:40Z) - Probably Approximately Correct Constrained Learning [135.48447120228658]
We develop a generalization theory based on the probably approximately correct (PAC) learning framework.
We show that imposing a learner does not make a learning problem harder in the sense that any PAC learnable class is also a constrained learner.
We analyze the properties of this solution and use it to illustrate how constrained learning can address problems in fair and robust classification.
arXiv Detail & Related papers (2020-06-09T19:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.