NASH: A Simple Unified Framework of Structured Pruning for Accelerating
Encoder-Decoder Language Models
- URL: http://arxiv.org/abs/2310.10054v1
- Date: Mon, 16 Oct 2023 04:27:36 GMT
- Title: NASH: A Simple Unified Framework of Structured Pruning for Accelerating
Encoder-Decoder Language Models
- Authors: Jongwoo Ko, Seungjoon Park, Yujin Kim, Sumyeong Ahn, Du-Seong Chang,
Euijai Ahn, Se-Young Yun
- Abstract summary: We propose a simple and effective framework, NASH, that narrows the encoder and shortens the decoder networks of encoder-decoder models.
Our findings highlight two insights: (1) the number of decoder layers is the dominant factor of inference speed, and (2) low sparsity in the pruned encoder network enhances generation quality.
- Score: 29.468888611690346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured pruning methods have proven effective in reducing the model size
and accelerating inference speed in various network architectures such as
Transformers. Despite the versatility of encoder-decoder models in numerous NLP
tasks, the structured pruning methods on such models are relatively less
explored compared to encoder-only models. In this study, we investigate the
behavior of the structured pruning of the encoder-decoder models in the
decoupled pruning perspective of the encoder and decoder component,
respectively. Our findings highlight two insights: (1) the number of decoder
layers is the dominant factor of inference speed, and (2) low sparsity in the
pruned encoder network enhances generation quality. Motivated by these
findings, we propose a simple and effective framework, NASH, that narrows the
encoder and shortens the decoder networks of encoder-decoder models. Extensive
experiments on diverse generation and inference tasks validate the
effectiveness of our method in both speedup and output quality.
Related papers
- Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks [53.550782959908524]
We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and decomposable tasks.
Our method, prompt-in-decoder (PiD), encodes the input once and decodes the output in parallel, boosting both training and inference efficiency.
arXiv Detail & Related papers (2024-03-19T19:27:23Z) - Extreme Encoder Output Frame Rate Reduction: Improving Computational
Latencies of Large End-to-End Models [59.57732929473519]
We apply multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.
We demonstrate that we can generate one encoder output frame for every 2.56 sec of input speech, without significantly affecting word error rate on a large-scale voice search task.
arXiv Detail & Related papers (2024-02-27T03:40:44Z) - Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition [20.052245837954175]
We propose an efficient and accurate streaming speech recognition model based on the FastConformer architecture.
We introduce an activation caching mechanism to enable the non-autoregressive encoder to operate autoregressively during inference.
A hybrid CTC/RNNT architecture which utilizes a shared encoder with both a CTC and RNNT decoder to boost the accuracy and save computation.
arXiv Detail & Related papers (2023-12-27T21:04:26Z) - DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
Transformer Models [22.276574156358084]
We build a multi-exit encoder-decoder transformer model which is trained with deep supervision so that each of its decoder layers is capable of generating plausible predictions.
We show our approach can reduce overall inference latency by 30%-60% with comparable or even higher accuracy compared to baselines.
arXiv Detail & Related papers (2023-11-15T01:01:02Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - String-based Molecule Generation via Multi-decoder VAE [56.465033997245776]
We investigate the problem of string-based molecular generation via variational autoencoders (VAEs)
We propose a simple, yet effective idea to improve the performance of VAE for the task.
In our experiments, the proposed VAE model particularly performs well for generating a sample from out-of-domain distribution.
arXiv Detail & Related papers (2022-08-23T03:56:30Z) - CarNet: A Lightweight and Efficient Encoder-Decoder Architecture for
High-quality Road Crack Detection [21.468229247797627]
We present a lightweight encoder-decoder architecture, CarNet, for efficient and high-quality crack detection.
In particular, we propose that the ideal encoder should present an olive-type distribution about the number of convolutional layers at different stages.
In the decoder, we introduce a lightweight up-sampling feature pyramid module to learn rich hierarchical features for crack detection.
arXiv Detail & Related papers (2021-09-13T05:01:34Z) - Jointly Optimizing State Operation Prediction and Value Generation for
Dialogue State Tracking [23.828348485513043]
We investigate the problem of multi-domain Dialogue State Tracking (DST) with open vocabulary.
Existing approaches exploit BERT encoder and copy-based RNN decoder, where the encoder predicts the state operation, and the decoder generates new slot values.
We propose a purely Transformer-based framework, where a single BERT works as both the encoder and the decoder.
arXiv Detail & Related papers (2020-10-24T04:54:52Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.