A Mathematical Foundation for Robust Machine Learning based on
Bias-Variance Trade-off
- URL: http://arxiv.org/abs/2106.05522v1
- Date: Thu, 10 Jun 2021 06:21:55 GMT
- Title: A Mathematical Foundation for Robust Machine Learning based on
Bias-Variance Trade-off
- Authors: Ou Wu and Weiyao Zhu and Yingjun Deng and Haixiang Zhang and Qinghu
Hou
- Abstract summary: Some samples are difficult to learn and some samples are noisy. The unequal contributions of samples has a considerable effect on training performances.
Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ.
This study attempts to construct a mathematical foundation for robust machine learning (RML) based on the bias-variance trade-off theory.
- Score: 3.3161271977874964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common assumption in machine learning is that samples are independently and
identically distributed (i.i.d). However, the contributions of different
samples are not identical in training. Some samples are difficult to learn and
some samples are noisy. The unequal contributions of samples has a considerable
effect on training performances. Studies focusing on unequal sample
contributions (e.g., easy, hard, noisy) in learning usually refer to these
contributions as robust machine learning (RML). Weighing and regularization are
two common techniques in RML. Numerous learning algorithms have been proposed
but the strategies for dealing with easy/hard/noisy samples differ or even
contradict with different learning algorithms. For example, some strategies
take the hard samples first, whereas some strategies take easy first.
Conducting a clear comparison for existing RML algorithms in dealing with
different samples is difficult due to lack of a unified theoretical framework
for RML. This study attempts to construct a mathematical foundation for RML
based on the bias-variance trade-off theory. A series of definitions and
properties are presented and proved. Several classical learning algorithms are
also explained and compared. Improvements of existing methods are obtained
based on the comparison. A unified method that combines two classical learning
strategies is proposed.
Related papers
- The Cram Method for Efficient Simultaneous Learning and Evaluation [0.9208007322096533]
We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation.
Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting.
Our extensive simulation studies show that, when compared to sample-splitting, cramming reduces the evaluation standard error by more than 40%.
arXiv Detail & Related papers (2024-03-11T04:19:05Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - Adaptive Soft Contrastive Learning [19.45520684918576]
This paper proposes an adaptive method that introduces soft inter-sample relations, namely Adaptive Soft Contrastive Learning (ASCL)
As an effective and concise plug-in module for existing self-supervised learning frameworks, ASCL achieves the best performance on several benchmarks.
arXiv Detail & Related papers (2022-07-22T16:01:07Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies.
We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z) - Theory and Algorithms for Shapelet-based Multiple-Instance Learning [5.08418565337126]
We propose a new formulation of Multiple-Instance Learning (MIL), in which a unit of data consists of instances called a bag.
The goal is to find a good classifier of bags based on the similarity with a "shapelet" (or pattern)
In our formulation, we use all possible, and thus infinitely many shapelets, resulting in a richer class of classifiers.
arXiv Detail & Related papers (2020-05-31T17:10:59Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.