Finding Optimally Robust Data Mixtures via Concave Maximization
- URL: http://arxiv.org/abs/2406.01477v1
- Date: Mon, 3 Jun 2024 16:06:12 GMT
- Title: Finding Optimally Robust Data Mixtures via Concave Maximization
- Authors: Anvith Thudi, Chris J. Maddison,
- Abstract summary: Group distribution optimization (group DRO) is one popular way to learn variations of the performance of non-Income models.
We show that a method we call MixMax selects a particular mixture with entropic ascent, and, crucially, we prove that optimally fitting this distribution over the set of bounded weights returns a group DRO optimal model.
- Score: 18.144960432059634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training on mixtures of data distributions is now common in many modern machine learning pipelines, useful for performing well on several downstream tasks. Group distributionally robust optimization (group DRO) is one popular way to learn mixture weights for training a specific model class, but group DRO methods suffer for non-linear models due to non-convex loss functions and when the models are non-parametric. We address these challenges by proposing to solve a more general DRO problem, giving a method we call MixMax. MixMax selects mixture weights by maximizing a particular concave objective with entropic mirror ascent, and, crucially, we prove that optimally fitting this mixture distribution over the set of bounded predictors returns a group DRO optimal model. Experimentally, we tested MixMax on a sequence modeling task with transformers and on a variety of non-parametric learning problems. In all instances MixMax matched or outperformed the standard data mixing and group DRO baselines, and in particular, MixMax improved the performance of XGBoost over the only baseline, data balancing, for variations of the ACSIncome and CelebA annotations datasets.
Related papers
- RegMix: Data Mixture as Regression for Language Model Pre-training [40.45464495981735]
We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task.
RegMix involves training a set of small models with diverse data mixtures and fitting a regression model to predict their performance.
Our method demonstrates superior performance compared to human selection and achieves results that match or surpass DoReMi.
arXiv Detail & Related papers (2024-07-01T17:31:03Z) - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Universal Lower Bounds and Optimal Rates: Achieving Minimax Clustering Error in Sub-Exponential Mixture Models [8.097200145973389]
We first establish a universal lower bound for the error rate in clustering any mixture model.
We then demonstrate that iterative algorithms attain this lower bound in mixture models with sub-exponential tails.
For datasets better modelled by Poisson or Negative Binomial mixtures, we study mixture models whose distributions belong to an exponential family.
In such mixtures, we establish that Bregman hard clustering, a variant of Lloyd's algorithm employing a Bregman divergence, is rate optimal.
arXiv Detail & Related papers (2024-02-23T16:51:17Z) - Efficient Online Data Mixing For Language Model Pre-Training [101.45242332613944]
Existing data selection methods suffer from slow and computationally expensive processes.
Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together.
We develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing.
arXiv Detail & Related papers (2023-12-05T00:42:35Z) - RandoMix: A mixed sample data augmentation method with multiple mixed
modes [12.466162659083697]
RandoMix is a mixed-sample data augmentation method designed to address robustness and diversity challenges.
We evaluate the effectiveness of RandoMix on diverse datasets, including CIFAR-10/100, Tiny-ImageNet, ImageNet, and Google Speech Commands.
arXiv Detail & Related papers (2022-05-18T05:31:36Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - A Wasserstein Minimax Framework for Mixed Linear Regression [69.40394595795544]
Multi-modal distributions are commonly used to model clustered data in learning tasks.
We propose an optimal transport-based framework for Mixed Linear Regression problems.
arXiv Detail & Related papers (2021-06-14T16:03:51Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.