Related papers: Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality

Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality

URL: http://arxiv.org/abs/2211.15578v1
Date: Mon, 28 Nov 2022 17:36:41 GMT
Title: Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality
Authors: Yichen Jiang, Xiang Zhou, Mohit Bansal
Abstract summary: Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples. We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
Score: 84.94877848357896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. In this work, we analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias (i.e., a source sequence already mapped to a target sequence is less likely to be mapped to other target sequences), and the tendency to memorize whole examples rather than separating structures from contents. We propose two techniques to address these two issues respectively: Mutual Exclusivity Training that prevents the model from producing seen generations when facing novel, unseen examples via an unlikelihood-based loss; and prim2primX data augmentation that automatically diversifies the arguments of every syntactic function to prevent memorizing and provide a compositional inductive bias without exposing test-set data. Combining these two techniques, we show substantial empirical improvements using standard sequence-to-sequence models (LSTMs and Transformers) on two widely-used compositionality datasets: SCAN and COGS. Finally, we provide analysis characterizing the improvements as well as the remaining challenges, and provide detailed ablations of our method. Our code is available at https://github.com/owenzx/met-primaug

Related papers

Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models. A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
Sub-graph Based Diffusion Model for Link Prediction [43.15741675617231]
Denoising Diffusion Probabilistic Models (DDPMs) represent a contemporary class of generative models with exceptional qualities. We build a novel generative model for link prediction using a dedicated design to decompose the likelihood estimation process via the Bayesian formula. Our proposed method presents numerous advantages: (1) transferability across datasets without retraining, (2) promising generalization on limited training data, and (3) robustness against graph adversarial attacks.
arXiv Detail & Related papers (2024-09-13T02:23:55Z)
A Fixed-Point Approach for Causal Generative Modeling [20.88890689294816]
We propose a novel formalism for describing Structural Causal Models (SCMs) as fixed-point problems on causally ordered variables. We establish the weakest known conditions for their unique recovery given the topological ordering (TO)
arXiv Detail & Related papers (2024-04-10T12:29:05Z)
FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning [21.088762527081883]
Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks. Recent approaches to incrementally learning the classifier by freezing the feature extractor after the first task have gained much attention. We explore prototypical networks for CIL, which generate new class prototypes using the frozen feature extractor and classify the features based on the Euclidean distance to the prototypes.
arXiv Detail & Related papers (2023-09-25T11:54:33Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Compositional Generalization without Trees using Multiset Tagging and Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens. Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations. Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z)
Imputing Missing Observations with Time Sliced Synthetic Minority Oversampling Technique [0.3973560285628012]
We present a simple yet novel time series imputation technique with the goal of constructing an irregular time series that is uniform across every sample in a data set. We fix a grid defined by the midpoints of non-overlapping bins (dubbed "slices") of observation times and ensure that each sample has values for all of the features at that given time. This allows one to both impute fully missing observations to allow uniform time series classification across the entire data and, in special cases, to impute individually missing features.
arXiv Detail & Related papers (2022-01-14T19:23:24Z)
Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions. Existing neural models have been shown to lack this basic ability in learning symbolic structures. We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z)
Contrastive Self-supervised Sequential Recommendation with Robust Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data. Old and new issues remain, including data-sparsity and noisy data. We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z)
Structured Reordering for Modeling Latent Alignments in Sequence Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z)
Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation [25.37244686572865]
We propose a novel method called Adversarial and Contrastive Variational Autoencoder (ACVAE) for sequential recommendation. We first introduce the adversarial training for sequence generation under the Adversarial Variational Bayes framework, which enables our model to generate high-quality latent variables. Besides, when encoding the sequence, we apply a recurrent and convolutional structure to capture global and local relationships in the sequence.
arXiv Detail & Related papers (2021-03-19T09:01:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.