FutureFill: Fast Generation from Convolutional Sequence Models
- URL: http://arxiv.org/abs/2410.03766v2
- Date: Fri, 25 Oct 2024 19:45:33 GMT
- Title: FutureFill: Fast Generation from Convolutional Sequence Models
- Authors: Naman Agarwal, Xinyi Chen, Evan Dogariu, Vlad Feinberg, Daniel Suo, Peter Bartlett, Elad Hazan,
- Abstract summary: FutureFill is a method for fast generation that applies to any sequence prediction algorithm based on convolutional operators.
Our approach reduces the generation time requirement from quadratic to quasilinear relative to the context length.
We validate our theoretical findings with experimental evidence demonstrating correctness and efficiency gains in a synthetic generation task.
- Score: 22.410028211490424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the challenge of efficient auto-regressive generation in sequence prediction models by introducing FutureFill - a method for fast generation that applies to any sequence prediction algorithm based on convolutional operators. Our approach reduces the generation time requirement from quadratic to quasilinear relative to the context length. Additionally, FutureFill requires a prefill cache sized only by the number of tokens generated, which is smaller than the cache requirements for standard convolutional and attention-based models. We validate our theoretical findings with experimental evidence demonstrating correctness and efficiency gains in a synthetic generation task.
Related papers
- Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - Loop Neural Networks for Parameter Sharing [1.1049608786515839]
We introduce a novel Loop Neural Network, which achieves better performance by utilizing longer computational time without increasing the model size.
Our approach revisits the input multiple times, refining the prediction by iteratively looping over a subset of the model with residual connections.
We demonstrate the effectiveness of this method through experiments comparing versions of GPT-2 with our loop models, showing improved performance in language modeling tasks while maintaining similar parameter counts.
arXiv Detail & Related papers (2024-09-21T17:07:42Z) - TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classification [0.42105583610914427]
We introduce TX-Gen, a novel algorithm for generating counterfactual explanations based on the Non-dominated Sorting Genetic Algorithm II (NSGA-II)
By incorporating a flexible reference-guided mechanism, our method improves the plausibility and interpretability of the counterfactuals without relying on predefined assumptions.
arXiv Detail & Related papers (2024-09-14T15:13:28Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.
The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - Explainable Parallel RCNN with Novel Feature Representation for Time
Series Forecasting [0.0]
Time series forecasting is a fundamental challenge in data science.
We develop a parallel deep learning framework composed of RNN and CNN.
Extensive experiments on three datasets reveal the effectiveness of our method.
arXiv Detail & Related papers (2023-05-08T17:20:13Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - An EM Approach to Non-autoregressive Conditional Sequence Generation [49.11858479436565]
Autoregressive (AR) models have been the dominating approach to conditional sequence generation.
Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel.
This paper proposes a new approach that jointly optimize both AR and NAR models in a unified Expectation-Maximization framework.
arXiv Detail & Related papers (2020-06-29T20:58:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.