σ-GPTs: A New Approach to Autoregressive Models
- URL: http://arxiv.org/abs/2404.09562v2
- Date: Mon, 1 Jul 2024 06:46:36 GMT
- Title: σ-GPTs: A New Approach to Autoregressive Models
- Authors: Arnaud Pannatier, Evann Courdier, François Fleuret,
- Abstract summary: We show that by simply adding a positional encoding for the output, this order can be modulated on-the-fly per-sample.
We evaluate our method across various domains, including language modeling, path-solving, and aircraft vertical rate prediction.
- Score: 19.84252724050016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoregressive models, such as the GPT family, use a fixed order, usually left-to-right, to generate sequences. However, this is not a necessity. In this paper, we challenge this assumption and show that by simply adding a positional encoding for the output, this order can be modulated on-the-fly per-sample which offers key advantageous properties. It allows for the sampling of and conditioning on arbitrary subsets of tokens, and it also allows sampling in one shot multiple tokens dynamically according to a rejection strategy, leading to a sub-linear number of model evaluations. We evaluate our method across various domains, including language modeling, path-solving, and aircraft vertical rate prediction, decreasing the number of steps required for generation by an order of magnitude.
Related papers
- Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering [55.15192437680943]
Generative models lack rigorous statistical guarantees for their outputs.
We propose a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee.
This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example.
arXiv Detail & Related papers (2024-10-02T15:26:52Z) - Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - Approximate Message Passing for the Matrix Tensor Product Model [8.206394018475708]
We propose and analyze an approximate message passing (AMP) algorithm for the matrix tensor product model.
Building upon an convergence theorem for non-separable functions, we prove a state evolution for non-separable functions.
We leverage this state evolution result to provide necessary and sufficient conditions for recovery of the signal of interest.
arXiv Detail & Related papers (2023-06-27T16:03:56Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Partial sequence labeling with structured Gaussian Processes [8.239028141030621]
We propose structured Gaussian Processes for partial sequence labeling.
It encodes uncertainty in the prediction and does not need extra effort for model selection and hyper parameter learning.
It is evaluated on several sequence labeling tasks and the experimental results show the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-20T00:56:49Z) - Calibrating Over-Parametrized Simulation Models: A Framework via
Eligibility Set [3.862247454265944]
We develop a framework to develop calibration schemes that satisfy rigorous frequentist statistical guarantees.
We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator.
arXiv Detail & Related papers (2021-05-27T00:59:29Z) - Parallel and Flexible Sampling from Autoregressive Models via Langevin
Dynamics [13.097161185372151]
We propose a sampling procedure that initializes a sequence with white noise and follows a Markov chain defined by Langevin dynamics on the global log-likelihood of the sequence.
We apply these techniques to autoregressive models in the visual and audio domains, with competitive results for audio source separation, super-resolution, and inpainting.
arXiv Detail & Related papers (2021-05-17T21:07:02Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Feature Transformation Ensemble Model with Batch Spectral Regularization
for Cross-Domain Few-Shot Classification [66.91839845347604]
We propose an ensemble prediction model by performing diverse feature transformations after a feature extraction network.
We use a batch spectral regularization term to suppress the singular values of the feature matrix during pre-training to improve the generalization ability of the model.
The proposed model can then be fine tuned in the target domain to address few-shot classification.
arXiv Detail & Related papers (2020-05-18T05:31:04Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.