Conditional set generation using Seq2seq models
- URL: http://arxiv.org/abs/2205.12485v1
- Date: Wed, 25 May 2022 04:17:50 GMT
- Title: Conditional set generation using Seq2seq models
- Authors: Aman Madaan, Dheeraj Rajagopal, Niket Tandon, Yiming Yang, Antoine
Bosselut
- Abstract summary: Conditional set generation learns a mapping from an input sequence of tokens to a set.
Sequence-to-sequence(Seq2seq) models are a popular choice to model set generation.
We propose a novel algorithm for effectively sampling informative orders over the space of label orders.
- Score: 52.516563721766445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conditional set generation learns a mapping from an input sequence of tokens
to a set. Several NLP tasks, such as entity typing and dialogue emotion
tagging, are instances of set generation. Sequence-to-sequence~(Seq2seq) models
are a popular choice to model set generation, but they treat a set as a
sequence and do not fully leverage its key properties, namely order-invariance
and cardinality. We propose a novel algorithm for effectively sampling
informative orders over the combinatorial space of label orders. Further, we
jointly model the set cardinality and output by adding the set size as the
first element and taking advantage of the autoregressive factorization used by
Seq2seq models. Our method is a model-independent data augmentation approach
that endows any Seq2seq model with the signals of order-invariance and
cardinality. Training a Seq2seq model on this new augmented data~(without any
additional annotations) gets an average relative improvement of 20% for four
benchmarks datasets across models spanning from BART-base, T5-xxl, and GPT-3.
Related papers
- Seq2seq is All You Need for Coreference Resolution [26.551602768015986]
We finetune a pretrained seq2seq transformer to map an input document to a tagged sequence encoding the coreference annotation.
Our model outperforms or closely matches the best coreference systems in the literature on an array of datasets.
arXiv Detail & Related papers (2023-10-20T19:17:22Z) - Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE)
We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations.
In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - OTSeq2Set: An Optimal Transport Enhanced Sequence-to-Set Model for
Extreme Multi-label Text Classification [9.990725102725916]
Extreme multi-label text classification (XMTC) is the task of finding the most relevant subset labels from a large-scale label collection.
We propose an autoregressive sequence-to-set model for XMTC tasks named OTSeq2Set.
Our model generates predictions in student-forcing scheme and is trained by a loss function based on bipartite matching.
arXiv Detail & Related papers (2022-10-26T07:25:18Z) - Sequence-to-Action: Grammatical Error Correction with Action Guided
Sequence Generation [21.886973310718457]
We propose a novel Sequence-to-Action(S2A) module for Grammatical Error Correction.
The S2A module jointly takes the source and target sentences as input, and is able to automatically generate a token-level action sequence.
Our model consistently outperforms the seq2seq baselines, while being able to significantly alleviate the over-correction problem.
arXiv Detail & Related papers (2022-05-22T17:47:06Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Minimize Exposure Bias of Seq2Seq Models in Joint Entity and Relation
Extraction [57.22929457171352]
Joint entity and relation extraction aims to extract relation triplets from plain text directly.
We propose a novel Sequence-to-Unordered-Multi-Tree (Seq2UMTree) model to minimize the effects of exposure bias.
arXiv Detail & Related papers (2020-09-16T06:53:34Z) - Fast Transformers with Clustered Attention [14.448898156256478]
We propose clustered attention, which instead of computing the attention for every query, groups queries into clusters and computes attention just for the centroids.
This results in a model with linear complexity with respect to the sequence length for a fixed number of clusters.
We evaluate our approach on two automatic speech recognition datasets and show that our model consistently outperforms vanilla transformers for a given computational budget.
arXiv Detail & Related papers (2020-07-09T14:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.