Related papers: A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off

A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off

URL: http://arxiv.org/abs/2106.05522v1
Date: Thu, 10 Jun 2021 06:21:55 GMT
Title: A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off
Authors: Ou Wu and Weiyao Zhu and Yingjun Deng and Haixiang Zhang and Qinghu Hou
Abstract summary: Some samples are difficult to learn and some samples are noisy. The unequal contributions of samples has a considerable effect on training performances. Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ. This study attempts to construct a mathematical foundation for robust machine learning (RML) based on the bias-variance trade-off theory.
Score: 3.3161271977874964
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some samples are noisy. The unequal contributions of samples has a considerable effect on training performances. Studies focusing on unequal sample contributions (e.g., easy, hard, noisy) in learning usually refer to these contributions as robust machine learning (RML). Weighing and regularization are two common techniques in RML. Numerous learning algorithms have been proposed but the strategies for dealing with easy/hard/noisy samples differ or even contradict with different learning algorithms. For example, some strategies take the hard samples first, whereas some strategies take easy first. Conducting a clear comparison for existing RML algorithms in dealing with different samples is difficult due to lack of a unified theoretical framework for RML. This study attempts to construct a mathematical foundation for RML based on the bias-variance trade-off theory. A series of definitions and properties are presented and proved. Several classical learning algorithms are also explained and compared. Improvements of existing methods are obtained based on the comparison. A unified method that combines two classical learning strategies is proposed.

Related papers

Nearly Optimal Sample Complexity for Learning with Label Proportions [54.67830198790247]
We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags.<n>Despite the partial observability, the goal is still to achieve small regret at the level of individual examples.<n>We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal.
arXiv Detail & Related papers (2025-05-08T15:45:23Z)
A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty [12.382999548648726]
Existing studies assume a uniform unlearning difficulty across samples. We propose a Memory Removal Difficulty ($mathrmMRD$) metric to quantify sample-level unlearning difficulty. We also propose an $mathrmMRD$-based weighted sampling method to optimize existing unlearning algorithms.
arXiv Detail & Related papers (2025-04-09T07:48:10Z)
Geometric Mean Improves Loss For Few-Shot Learning [22.919680257199754]
We propose a novel loss based on emphgeometric mean to embed discriminative metric into deep features. In the experiments on few-shot image classification tasks, the method produces competitive performance in comparison to the other losses.
arXiv Detail & Related papers (2025-01-24T15:56:55Z)
Reciprocal Learning [0.0]
We show that a wide array of machine learning algorithms are specific instances of one single paradigm: reciprocal learning. We introduce reciprocal learning as a generalization of these algorithms using the language of decision theory. We find that reciprocal learning algorithms converge at linear rates to an approximately optimal model under relatively mild assumptions on the loss function.
arXiv Detail & Related papers (2024-08-12T16:14:52Z)
The Cram Method for Efficient Simultaneous Learning and Evaluation [0.9208007322096533]
We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting. Our extensive simulation studies show that, when compared to sample-splitting, cramming reduces the evaluation standard error by more than 40%.
arXiv Detail & Related papers (2024-03-11T04:19:05Z)
A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes. On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z)
Active Learning Principles for In-Context Learning with Large Language Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning. We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z)
An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z)
Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies. We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z)
Theory and Algorithms for Shapelet-based Multiple-Instance Learning [5.08418565337126]
We propose a new formulation of Multiple-Instance Learning (MIL), in which a unit of data consists of instances called a bag. The goal is to find a good classifier of bags based on the similarity with a "shapelet" (or pattern) In our formulation, we use all possible, and thus infinitely many shapelets, resulting in a richer class of classifiers.
arXiv Detail & Related papers (2020-05-31T17:10:59Z)
Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature. It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.