Related papers: Sequence-to-Set Generative Models

Sequence-to-Set Generative Models

URL: http://arxiv.org/abs/2209.08801v1
Date: Mon, 19 Sep 2022 07:13:51 GMT
Title: Sequence-to-Set Generative Models
Authors: Longtao Tang, Ying Zhou and Yu Yang
Abstract summary: We propose a sequence-to-set method to transform any sequence generative model into a set generative model. We present GRU2Set, which is an instance of our sequence-to-set method and employs the famous GRU model as the sequence generative model. A direct application of our models is to learn an order/set distribution from a collection of e-commerce orders.
Score: 9.525560801277903
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a sequence-to-set method that can transform any sequence generative model based on maximum likelihood to a set generative model where we can evaluate the utility/probability of any set. An efficient importance sampling algorithm is devised to tackle the computational challenge of learning our sequence-to-set model. We present GRU2Set, which is an instance of our sequence-to-set method and employs the famous GRU model as the sequence generative model. To further obtain permutation invariant representation of sets, we devise the SetNN model which is also an instance of the sequence-to-set model. A direct application of our models is to learn an order/set distribution from a collection of e-commerce orders, which is an essential step in many important operational decisions such as inventory arrangement for fast delivery. Based on the intuition that small-sized sets are usually easier to learn than large sets, we propose a size-bias trick that can help learn better set distributions with respect to the $\ell_1$-distance evaluation metric. Two e-commerce order datasets, TMALL and HKTVMALL, are used to conduct extensive experiments to show the effectiveness of our models. The experimental results demonstrate that our models can learn better set/order distributions from order data than the baselines. Moreover, no matter what model we use, applying the size-bias trick can always improve the quality of the set distribution learned from data.

Related papers

GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling [0.0]
We develop GateLoop, a sequence model that generalizes linear recurrent models such as S4, S5, LRU and RetNet. GateLoop empirically outperforms existing models for auto-regressive language modeling. We prove that our approach can be interpreted as providing data-controlled relative-positional information to Attention.
arXiv Detail & Related papers (2023-11-03T14:08:39Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Enhancing Few-shot NER with Prompt Ordering based Data Augmentation [59.69108119752584]
We propose a Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks. Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-19T16:25:43Z)
MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training. Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z)
Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks. We propose Sample-specific Ensemble of Source Models (SESoM) SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z)
Conditional set generation using Seq2seq models [52.516563721766445]
Conditional set generation learns a mapping from an input sequence of tokens to a set. Sequence-to-sequence(Seq2seq) models are a popular choice to model set generation. We propose a novel algorithm for effectively sampling informative orders over the space of label orders.
arXiv Detail & Related papers (2022-05-25T04:17:50Z)
Hierarchical Few-Shot Generative Models [18.216729811514718]
We study a latent variables approach that extends the Neural Statistician to a fully hierarchical approach with an attention-based point to set-level aggregation. Our results show that the hierarchical formulation better captures the intrinsic variability within the sets in the small data regime.
arXiv Detail & Related papers (2021-10-23T19:19:39Z)
Effect of depth order on iterative nested named entity recognition models [1.619995421534183]
We study the effect of the order of depth of mention on nested named entity recognition (NER) models. We design an order-agnostic iterative model and a procedure to choose a custom order during training and prediction. We show that the smallest to largest order gives the best results.
arXiv Detail & Related papers (2021-04-02T13:18:52Z)
SetVAE: Learning Hierarchical Composition for Generative Modeling of Set-Structured Data [27.274328701618]
We propose SetVAE, a hierarchical variational autoencoder for sets. Motivated by recent progress in set encoding, we build SetVAE upon attentive modules that first partition the set and project the partition back to the original cardinality. We demonstrate that our model generalizes to unseen set sizes and learns interesting subset relations without supervision.
arXiv Detail & Related papers (2021-03-29T14:01:18Z)
Predicting Sequences of Traversed Nodes in Graphs using Network Models with Multiple Higher Orders [1.0499611180329802]
We develop a technique to fit such multi-order models in empirical sequential data and to select the optimal maximum order. We evaluate our model based on six empirical data sets containing sequences from website navigation as well as public transport systems. We further demonstrate the accuracy of our method during out-of-sample sequence prediction and validate that our method can scale to data sets with millions of sequences.
arXiv Detail & Related papers (2020-07-13T20:08:14Z)
Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words" Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.