Provable Benefit of Mixup for Finding Optimal Decision Boundaries
- URL: http://arxiv.org/abs/2306.00267v2
- Date: Tue, 6 Jun 2023 02:19:58 GMT
- Title: Provable Benefit of Mixup for Finding Optimal Decision Boundaries
- Authors: Junsoo Oh, Chulhee Yun
- Abstract summary: We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries.
We show that Mixup mitigates this problem by significantly reducing the sample complexity.
- Score: 18.668531108219415
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate how pair-wise data augmentation techniques like Mixup affect
the sample complexity of finding optimal decision boundaries in a binary linear
classification problem. For a family of data distributions with a separability
constant $\kappa$, we analyze how well the optimal classifier in terms of
training loss aligns with the optimal one in test accuracy (i.e., Bayes optimal
classifier). For vanilla training without augmentation, we uncover an
interesting phenomenon named the curse of separability. As we increase $\kappa$
to make the data distribution more separable, the sample complexity of vanilla
training increases exponentially in $\kappa$; perhaps surprisingly, the task of
finding optimal decision boundaries becomes harder for more separable
distributions. For Mixup training, we show that Mixup mitigates this problem by
significantly reducing the sample complexity. To this end, we develop new
concentration results applicable to $n^2$ pair-wise augmented data points
constructed from $n$ independent data, by carefully dealing with dependencies
between overlapping pairs. Lastly, we study other masking-based Mixup-style
techniques and show that they can distort the training loss and make its
minimizer converge to a suboptimal classifier in terms of test accuracy.
Related papers
- The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis [6.996002801232415]
We study the sample complexity of the plug-in approach for learning $varepsilon$-optimal policies in average-reward Markov decision processes.
Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed.
arXiv Detail & Related papers (2024-10-10T05:08:14Z) - Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.
We propose a new learning paradigm that integrates both paired and unpaired data.
Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Tailoring Mixup to Data for Calibration [12.050401897136501]
Mixup is a technique for improving calibration and predictive uncertainty.
In this work, we argue that the likelihood of manifold intrusion increases with the distance between data to mix.
We propose to dynamically change the underlying distributions of coefficients depending on the similarity between samples to mix.
arXiv Detail & Related papers (2023-11-02T17:48:28Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity [15.780905917870427]
We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data.
We also propose an efficient modular approximation based iterative submodular computation algorithm for efficient mixup per each minibatch.
Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results.
arXiv Detail & Related papers (2021-02-05T09:12:02Z) - Distributionally Robust Weighted $k$-Nearest Neighbors [21.537952410507483]
Learning a robust classifier from a few samples remains a key challenge in machine learning.
In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors.
We develop an algorithm, textttDr.k-NN, that efficiently solves this functional optimization problem.
arXiv Detail & Related papers (2020-06-07T00:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.