Simple and effective data augmentation for compositional generalization
- URL: http://arxiv.org/abs/2401.09815v1
- Date: Thu, 18 Jan 2024 09:13:59 GMT
- Title: Simple and effective data augmentation for compositional generalization
- Authors: Yuekun Yao and Alexander Koller
- Abstract summary: We show that data augmentation methods that sample MRs and backtranslate them can be effective for compositional generalization.
Remarkably, sampling from a uniform distribution performs almost as well as sampling from the test distribution.
- Score: 64.00420578048855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compositional generalization, the ability to predict complex meanings from
training on simpler sentences, poses challenges for powerful pretrained seq2seq
models. In this paper, we show that data augmentation methods that sample MRs
and backtranslate them can be effective for compositional generalization, but
only if we sample from the right distribution. Remarkably, sampling from a
uniform distribution performs almost as well as sampling from the test
distribution, and greatly outperforms earlier methods that sampled from the
training distribution. We further conduct experiments to investigate the reason
why this happens and where the benefit of such data augmentation methods come
from.
Related papers
- Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization [12.472871440252105]
We show that sharpness-aware minimization (SAM) learns different features more uniformly, particularly in early epochs.
We propose a method that (i) clusters examples based on the network output early in training, (ii) identifies a cluster of examples with similar network output, and (iii) upsamples the rest of examples only once to alleviate the simplicity bias.
arXiv Detail & Related papers (2024-04-27T03:30:50Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - Open-Sampling: Exploring Out-of-Distribution data for Re-balancing
Long-tailed datasets [24.551465814633325]
Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance.
Recent studies found that directly training with out-of-distribution data in a semi-supervised manner would harm the generalization performance.
We propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset.
arXiv Detail & Related papers (2022-06-17T14:29:52Z) - Unified Regularity Measures for Sample-wise Learning and Generalization [18.10522585996242]
We propose a pair of sample regularity measures for both processes with a formulation-consistent representation.
Experiments validated the effectiveness and robustness of the proposed approaches for mini-batch SGD optimization.
arXiv Detail & Related papers (2021-08-09T10:11:14Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Robust Sampling in Deep Learning [62.997667081978825]
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization.
We address this problem by a new regularization method based on distributional robust optimization.
During the training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization.
arXiv Detail & Related papers (2020-06-04T09:46:52Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.