Related papers: Provable Benefit of Mixup for Finding Optimal Decision Boundaries

Provable Benefit of Mixup for Finding Optimal Decision Boundaries

URL: http://arxiv.org/abs/2306.00267v2
Date: Tue, 6 Jun 2023 02:19:58 GMT
Title: Provable Benefit of Mixup for Finding Optimal Decision Boundaries
Authors: Junsoo Oh, Chulhee Yun
Abstract summary: We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries. We show that Mixup mitigates this problem by significantly reducing the sample complexity.
Score: 18.668531108219415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. For a family of data distributions with a separability constant $\kappa$, we analyze how well the optimal classifier in terms of training loss aligns with the optimal one in test accuracy (i.e., Bayes optimal classifier). For vanilla training without augmentation, we uncover an interesting phenomenon named the curse of separability. As we increase $\kappa$ to make the data distribution more separable, the sample complexity of vanilla training increases exponentially in $\kappa$; perhaps surprisingly, the task of finding optimal decision boundaries becomes harder for more separable distributions. For Mixup training, we show that Mixup mitigates this problem by significantly reducing the sample complexity. To this end, we develop new concentration results applicable to $n^2$ pair-wise augmented data points constructed from $n$ independent data, by carefully dealing with dependencies between overlapping pairs. Lastly, we study other masking-based Mixup-style techniques and show that they can distort the training loss and make its minimizer converge to a suboptimal classifier in terms of test accuracy.

Related papers

Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training [53.07879717463279]
textscDomain2Vec decomposes any dataset into a linear combination of several emphmeta-domains<n>textscDomain2Vec helps find the data mixture that enhances downstream task performance with minimal computational overhead.
arXiv Detail & Related papers (2025-06-12T17:53:51Z)
Nearly Optimal Sample Complexity for Learning with Label Proportions [54.67830198790247]
We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags.<n>Despite the partial observability, the goal is still to achieve small regret at the level of individual examples.<n>We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal.
arXiv Detail & Related papers (2025-05-08T15:45:23Z)
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training [63.07024608399447]
We propose an automated framework that discovers, evaluates, and refines data mixtures in a pre-training setting. We introduce ClimbLab, a filtered 1.2-trillion-token corpus with 20 clusters as a research playground, and ClimbMix, a compact yet powerful 400-billion-token dataset.
arXiv Detail & Related papers (2025-04-17T17:58:13Z)
The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis [6.996002801232415]
We study the sample complexity of the plug-in approach for learning $varepsilon$-optimal policies in average-reward Markov decision processes. Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed.
arXiv Detail & Related papers (2024-10-10T05:08:14Z)
Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning. We propose a new learning paradigm that integrates both paired and unpaired data. Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Tailoring Mixup to Data for Calibration [12.050401897136501]
Mixup is a technique for improving calibration and predictive uncertainty. In this work, we argue that the likelihood of manifold intrusion increases with the distance between data to mix. We propose to dynamically change the underlying distributions of coefficients depending on the similarity between samples to mix.
arXiv Detail & Related papers (2023-11-02T17:48:28Z)
On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms. We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z)
Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data. In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM) DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity [15.780905917870427]
We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data. We also propose an efficient modular approximation based iterative submodular computation algorithm for efficient mixup per each minibatch. Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results.
arXiv Detail & Related papers (2021-02-05T09:12:02Z)
Distributionally Robust Weighted $k$-Nearest Neighbors [21.537952410507483]
Learning a robust classifier from a few samples remains a key challenge in machine learning. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors. We develop an algorithm, textttDr.k-NN, that efficiently solves this functional optimization problem.
arXiv Detail & Related papers (2020-06-07T00:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.