Related papers: Hierarchical Attention Encoder Decoder

Hierarchical Attention Encoder Decoder

URL: http://arxiv.org/abs/2306.01070v1
Date: Thu, 1 Jun 2023 18:17:23 GMT
Title: Hierarchical Attention Encoder Decoder
Authors: Asier Mujika
Abstract summary: Autoregressive modeling can generate complex and novel sequences that have many real-world applications. These models must generate outputs autoregressively, which becomes time-consuming when dealing with long sequences. We propose a model based on the Hierarchical Recurrent Decoder architecture.
Score: 2.4366811507669115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models have shown that autoregressive modeling can generate complex and novel sequences that have many real-world applications. However, these models must generate outputs autoregressively, which becomes time-consuming when dealing with long sequences. Hierarchical autoregressive approaches that compress data have been proposed as a solution, but these methods still generate outputs at the original data frequency, resulting in slow and memory-intensive models. In this paper, we propose a model based on the Hierarchical Recurrent Encoder Decoder (HRED) architecture. This model independently encodes input sub-sequences without global context, processes these sequences using a lower-frequency model, and decodes outputs at the original data frequency. By interpreting the encoder as an implicitly defined embedding matrix and using sampled softmax estimation, we develop a training algorithm that can train the entire model without a high-frequency decoder, which is the most memory and compute-intensive part of hierarchical approaches. In a final, brief phase, we train the decoder to generate data at the original granularity. Our algorithm significantly reduces memory requirements for training autoregressive models and it also improves the total training wall-clock time.

Related papers

Geometry-Preserving Encoder/Decoder in Latent Generative Models [13.703752179071333]
We introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE. We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder.
arXiv Detail & Related papers (2025-01-16T23:14:34Z)
$ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space. Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input. We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder. We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z)
Are We Using Autoencoders in a Wrong Way? [3.110260251019273]
Autoencoders are used for dimensionality reduction, anomaly detection and feature extraction. We revisited the standard training for the undercomplete Autoencoder modifying the shape of the latent space. We also explored the behaviour of the latent space in the case of reconstruction of a random sample from the whole dataset.
arXiv Detail & Related papers (2023-09-04T11:22:43Z)
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers [78.85346970193518]
Megabyte is a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. Experiments show that Megabyte allows byte-level models to perform competitively with subword models on long context language modeling. Results establish the viability of tokenization-free autoregressive sequence modeling at scale.
arXiv Detail & Related papers (2023-05-12T00:55:41Z)
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z)
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking. We finetune a pretrained encoder-decoder model using in the form of document to query generation. We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z)
Generative time series models using Neural ODE in Variational Autoencoders [0.0]
We implement Neural Ordinary Differential Equations in a Variational Autoencoder setting for generative time series modeling. An object-oriented approach to the code was taken to allow for easier development and research.
arXiv Detail & Related papers (2022-01-12T14:38:11Z)
Towards Generating Real-World Time Series Data [52.51620668470388]
We propose a novel generative framework for time series data generation - RTSGAN. RTSGAN learns an encoder-decoder module which provides a mapping between a time series instance and a fixed-dimension latent vector. To generate time series with missing values, we further equip RTSGAN with an observation embedding layer and a decide-and-generate decoder.
arXiv Detail & Related papers (2021-11-16T11:31:37Z)
Anytime Sampling for Autoregressive Models via Ordered Autoencoding [88.01906682843618]
Autoregressive models are widely used for tasks such as image and audio generation. The sampling process of these models does not allow interruptions and cannot adapt to real-time computational resources. We propose a new family of autoregressive models that enables anytime sampling.
arXiv Detail & Related papers (2021-02-23T05:13:16Z)
End-to-end Sinkhorn Autoencoder with Noise Generator [10.008055997630304]
We propose a novel end-to-end sinkhorn autoencoder with noise generator for efficient data collection simulation. Our method outperforms competing approaches on a challenging dataset of simulation data from Zero Degree Calorimeters of ALICE experiment in LHC.
arXiv Detail & Related papers (2020-06-11T18:04:10Z)
Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.