Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU
- URL: http://arxiv.org/abs/2010.05330v2
- Date: Thu, 28 Mar 2024 11:26:58 GMT
- Title: Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU
- Authors: Brielen Madureira, David Schlangen,
- Abstract summary: bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full.
We investigate how they behave under incremental interfaces, when partial output must be provided.
Results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality.
- Score: 19.812562421377706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). We investigate how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We test five models on various NLU datasets and compare their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality. The "omni-directional" BERT model, which achieves better non-incremental performance, is impacted more by the incremental access. This can be alleviated by adapting the training regime (truncated training), or the testing procedure, by delaying the output until some right context is available or by incorporating hypothetical right contexts generated by a language model like GPT-2.
Related papers
- Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources.
NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks.
In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - TAPIR: Learning Adaptive Revision for Incremental Natural Language
Understanding with a Two-Pass Model [14.846377138993645]
Recent neural network-based approaches for incremental processing mainly use RNNs or Transformers.
A restart-incremental interface that repeatedly passes longer input prefixes can be used to obtain partial outputs, while providing the ability to revise.
We propose the Two-pass model for AdaPtIve Revision (TAPIR) and introduce a method to obtain an incremental supervision signal for learning an adaptive revision policy.
arXiv Detail & Related papers (2023-05-18T09:58:19Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units [64.61596752343837]
We present a novel two-pass direct S2ST architecture, UnitY, which first generates textual representations and predicts discrete acoustic units.
We enhance the model performance by subword prediction in the first-pass decoder.
We show that the proposed methods boost the performance even when predicting spectrogram in the second pass.
arXiv Detail & Related papers (2022-12-15T18:58:28Z) - E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
Understanding and Generation [95.49128988683191]
Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models.
We propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2.
E2S2 improves the seq2seq models via integrating more efficient self-supervised information into the encoders.
arXiv Detail & Related papers (2022-05-30T08:25:36Z) - Latency Adjustable Transformer Encoder for Language Understanding [0.8287206589886879]
This paper proposes an efficient Transformer architecture that adjusts the inference computational cost adaptively with a desired inference latency speedup.
The proposed method detects less important hidden sequence elements (word-vectors) and eliminates them in each encoder layer using a proposed Attention Context Contribution (ACC) metric.
The proposed method mathematically and experimentally improves the inference latency of BERT_base and GPT-2 by up to 4.8 and 3.72 times with less than 0.75% accuracy drop and passable perplexity on average.
arXiv Detail & Related papers (2022-01-10T13:04:39Z) - Towards More Efficient Insertion Transformer with Fractional Positional
Encoding [44.45401243989363]
Auto-regressive neural sequence models have been shown to be effective across text generation tasks.
Their left-to-right decoding order prevents generation from being parallelized.
Insertion Transformer is an attractive alternative that allows outputting multiple tokens in a single generation step.
arXiv Detail & Related papers (2021-12-12T18:38:27Z) - s2s-ft: Fine-Tuning Pretrained Transformer Encoders for
Sequence-to-Sequence Learning [47.30689555136054]
We present a sequence-to-sequence fine-tuning toolkit s2s-ft, which adopts pretrained Transformers for conditional generation tasks.
S2s-ft achieves strong performance on several benchmarks of abstractive summarization, and question generation.
arXiv Detail & Related papers (2021-10-26T12:45:34Z) - Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU [19.103130032967663]
Incremental processing allows interactive systems to respond based on partial inputs.
Recent work attempts to apply Transformers incrementally via restart-incrementality.
This approach is computationally costly and does not scale efficiently for long sequences.
arXiv Detail & Related papers (2021-09-15T15:20:29Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.