Sequence-to-Set Generative Models
- URL: http://arxiv.org/abs/2209.08801v1
- Date: Mon, 19 Sep 2022 07:13:51 GMT
- Title: Sequence-to-Set Generative Models
- Authors: Longtao Tang, Ying Zhou and Yu Yang
- Abstract summary: We propose a sequence-to-set method to transform any sequence generative model into a set generative model.
We present GRU2Set, which is an instance of our sequence-to-set method and employs the famous GRU model as the sequence generative model.
A direct application of our models is to learn an order/set distribution from a collection of e-commerce orders.
- Score: 9.525560801277903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a sequence-to-set method that can transform any
sequence generative model based on maximum likelihood to a set generative model
where we can evaluate the utility/probability of any set. An efficient
importance sampling algorithm is devised to tackle the computational challenge
of learning our sequence-to-set model. We present GRU2Set, which is an instance
of our sequence-to-set method and employs the famous GRU model as the sequence
generative model. To further obtain permutation invariant representation of
sets, we devise the SetNN model which is also an instance of the
sequence-to-set model. A direct application of our models is to learn an
order/set distribution from a collection of e-commerce orders, which is an
essential step in many important operational decisions such as inventory
arrangement for fast delivery. Based on the intuition that small-sized sets are
usually easier to learn than large sets, we propose a size-bias trick that can
help learn better set distributions with respect to the $\ell_1$-distance
evaluation metric. Two e-commerce order datasets, TMALL and HKTVMALL, are used
to conduct extensive experiments to show the effectiveness of our models. The
experimental results demonstrate that our models can learn better set/order
distributions from order data than the baselines. Moreover, no matter what
model we use, applying the size-bias trick can always improve the quality of
the set distribution learned from data.
Related papers
- GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling [0.0]
We develop GateLoop, a sequence model that generalizes linear recurrent models such as S4, S5, LRU and RetNet.
GateLoop empirically outperforms existing models for auto-regressive language modeling.
We prove that our approach can be interpreted as providing data-controlled relative-positional information to Attention.
arXiv Detail & Related papers (2023-11-03T14:08:39Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Enhancing Few-shot NER with Prompt Ordering based Data Augmentation [59.69108119752584]
We propose a Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks.
Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-19T16:25:43Z) - MILO: Model-Agnostic Subset Selection Framework for Efficient Model
Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training.
Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - Conditional set generation using Seq2seq models [52.516563721766445]
Conditional set generation learns a mapping from an input sequence of tokens to a set.
Sequence-to-sequence(Seq2seq) models are a popular choice to model set generation.
We propose a novel algorithm for effectively sampling informative orders over the space of label orders.
arXiv Detail & Related papers (2022-05-25T04:17:50Z) - Hierarchical Few-Shot Generative Models [18.216729811514718]
We study a latent variables approach that extends the Neural Statistician to a fully hierarchical approach with an attention-based point to set-level aggregation.
Our results show that the hierarchical formulation better captures the intrinsic variability within the sets in the small data regime.
arXiv Detail & Related papers (2021-10-23T19:19:39Z) - Effect of depth order on iterative nested named entity recognition
models [1.619995421534183]
We study the effect of the order of depth of mention on nested named entity recognition (NER) models.
We design an order-agnostic iterative model and a procedure to choose a custom order during training and prediction.
We show that the smallest to largest order gives the best results.
arXiv Detail & Related papers (2021-04-02T13:18:52Z) - SetVAE: Learning Hierarchical Composition for Generative Modeling of
Set-Structured Data [27.274328701618]
We propose SetVAE, a hierarchical variational autoencoder for sets.
Motivated by recent progress in set encoding, we build SetVAE upon attentive modules that first partition the set and project the partition back to the original cardinality.
We demonstrate that our model generalizes to unseen set sizes and learns interesting subset relations without supervision.
arXiv Detail & Related papers (2021-03-29T14:01:18Z) - Predicting Sequences of Traversed Nodes in Graphs using Network Models
with Multiple Higher Orders [1.0499611180329802]
We develop a technique to fit such multi-order models in empirical sequential data and to select the optimal maximum order.
We evaluate our model based on six empirical data sets containing sequences from website navigation as well as public transport systems.
We further demonstrate the accuracy of our method during out-of-sample sequence prediction and validate that our method can scale to data sets with millions of sequences.
arXiv Detail & Related papers (2020-07-13T20:08:14Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.