FutureFill: Fast Generation from Convolutional Sequence Models
- URL: http://arxiv.org/abs/2410.03766v2
- Date: Fri, 25 Oct 2024 19:45:33 GMT
- Title: FutureFill: Fast Generation from Convolutional Sequence Models
- Authors: Naman Agarwal, Xinyi Chen, Evan Dogariu, Vlad Feinberg, Daniel Suo, Peter Bartlett, Elad Hazan,
- Abstract summary: FutureFill is a method for fast generation that applies to any sequence prediction algorithm based on convolutional operators.
Our approach reduces the generation time requirement from quadratic to quasilinear relative to the context length.
We validate our theoretical findings with experimental evidence demonstrating correctness and efficiency gains in a synthetic generation task.
- Score: 22.410028211490424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the challenge of efficient auto-regressive generation in sequence prediction models by introducing FutureFill - a method for fast generation that applies to any sequence prediction algorithm based on convolutional operators. Our approach reduces the generation time requirement from quadratic to quasilinear relative to the context length. Additionally, FutureFill requires a prefill cache sized only by the number of tokens generated, which is smaller than the cache requirements for standard convolutional and attention-based models. We validate our theoretical findings with experimental evidence demonstrating correctness and efficiency gains in a synthetic generation task.
Related papers
- Sundial: A Family of Highly Capable Time Series Foundation Models [64.6322079384575]
We introduce Sundial, a family of native, flexible, and scalable time series foundation models.
Our model is pre-trained without specifying any prior distribution and can generate multiple probable predictions.
By mitigating mode collapse through TimeFlow Loss, we pre-train a family of Sundial models on TimeBench, which exhibit unprecedented model capacity and generalization performance.
arXiv Detail & Related papers (2025-02-02T14:52:50Z) - Efficient Generative Modeling with Residual Vector Quantization-Based Tokens [5.949779668853557]
ResGen is an efficient RVQ-based discrete diffusion model that generates high-fidelity samples without compromising sampling speed.
We validate the efficacy and generalizability of the proposed method on two challenging tasks: conditional image generation on ImageNet 256x256 and zero-shot text-to-speech synthesis.
As we scale the depth of RVQ, our generative models exhibit enhanced generation fidelity or faster sampling speeds compared to similarly sized baseline models.
arXiv Detail & Related papers (2024-12-13T15:31:17Z) - Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization [74.3339999119713]
We develop a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies.
Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon.
arXiv Detail & Related papers (2024-12-06T18:22:59Z) - Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a generative Transformer for unified time series forecasting.
Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach.
arXiv Detail & Related papers (2024-10-07T07:27:39Z) - Loop Neural Networks for Parameter Sharing [1.1049608786515839]
We introduce a novel Loop Neural Network, which achieves better performance by utilizing longer computational time without increasing the model size.
Our approach revisits the input multiple times, refining the prediction by iteratively looping over a subset of the model with residual connections.
We demonstrate the effectiveness of this method through experiments comparing versions of GPT-2 with our loop models, showing improved performance in language modeling tasks while maintaining similar parameter counts.
arXiv Detail & Related papers (2024-09-21T17:07:42Z) - TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classification [0.42105583610914427]
We introduce TX-Gen, a novel algorithm for generating counterfactual explanations based on the Non-dominated Sorting Genetic Algorithm II (NSGA-II)
By incorporating a flexible reference-guided mechanism, our method improves the plausibility and interpretability of the counterfactuals without relying on predefined assumptions.
arXiv Detail & Related papers (2024-09-14T15:13:28Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting [1.5213268724320657]
Recurrent neural network-based sequence-to-sequence models have been extensively applied for multi-step-ahead time series forecasting.
These models typically involve a decoder trained using either its previous forecasts or the actual observed values as the decoder inputs.
This study proposes a novel training approach called reinforced decoder, which introduces auxiliary models to generate alternative decoder inputs.
arXiv Detail & Related papers (2024-06-14T00:24:29Z) - Non-autoregressive Sequence-to-Sequence Vision-Language Models [63.77614880533488]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.
The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - RecycleGPT: An Autoregressive Language Model with Recyclable Module [13.243551482623623]
We present RecycleGPT, a generative language model with fast decoding speed.
Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations.
Experiments and analysis demonstrate the effectiveness of our approach in lowering inference latency, achieving up to 1.4x speedup.
arXiv Detail & Related papers (2023-08-07T09:14:33Z) - Explainable Parallel RCNN with Novel Feature Representation for Time
Series Forecasting [0.0]
Time series forecasting is a fundamental challenge in data science.
We develop a parallel deep learning framework composed of RNN and CNN.
Extensive experiments on three datasets reveal the effectiveness of our method.
arXiv Detail & Related papers (2023-05-08T17:20:13Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - Deep Probabilistic Time Series Forecasting using Augmented Recurrent
Input for Dynamic Systems [12.319812075685956]
We combine the advances in both deep generative models and state space model (SSM) to come up with a novel, data-driven deep probabilistic sequence model.
Specially, we follow the popular encoder-decoder generative structure to build the recurrent neural networks (RNN) assisted variational sequence model.
In order to alleviate the issue of inconsistency between training and predicting, we (i) propose using a hybrid output as input at next time step, which brings training and predicting into alignment.
arXiv Detail & Related papers (2021-06-03T23:41:11Z) - Synergetic Learning of Heterogeneous Temporal Sequences for
Multi-Horizon Probabilistic Forecasting [48.8617204809538]
We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model.
To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks.
Our model can be trained effectively using variational inference and generates predictions with Monte-Carlo simulation.
arXiv Detail & Related papers (2021-01-31T11:00:55Z) - An EM Approach to Non-autoregressive Conditional Sequence Generation [49.11858479436565]
Autoregressive (AR) models have been the dominating approach to conditional sequence generation.
Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel.
This paper proposes a new approach that jointly optimize both AR and NAR models in a unified Expectation-Maximization framework.
arXiv Detail & Related papers (2020-06-29T20:58:57Z) - Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
Recognition [66.47000813920617]
We propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition.
The proposed model can accurately predict the length of the target sequence and achieve a competitive performance.
The model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.
arXiv Detail & Related papers (2020-05-16T08:27:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.