Pseudo-Bidirectional Decoding for Local Sequence Transduction
- URL: http://arxiv.org/abs/2001.11694v3
- Date: Sun, 1 Nov 2020 16:01:31 GMT
- Title: Pseudo-Bidirectional Decoding for Local Sequence Transduction
- Authors: Wangchunshu Zhou, Tao Ge, Ke Xu
- Abstract summary: We propose a simple but versatile approach named Pseudo-Bidirectional Decoding (PBD) for LST tasks.
The proposed PBD approach provides right side context information for the decoder and models the inductive bias of LST tasks.
Experimental results on several benchmark datasets show that our approach consistently improves the performance of standard seq2seq models on LST tasks.
- Score: 31.05704333618685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local sequence transduction (LST) tasks are sequence transduction tasks where
there exists massive overlapping between the source and target sequences, such
as Grammatical Error Correction (GEC) and spell or OCR correction. Previous
work generally tackles LST tasks with standard sequence-to-sequence (seq2seq)
models that generate output tokens from left to right and suffer from the issue
of unbalanced outputs. Motivated by the characteristic of LST tasks, in this
paper, we propose a simple but versatile approach named Pseudo-Bidirectional
Decoding (PBD) for LST tasks. PBD copies the corresponding representation of
source tokens to the decoder as pseudo future context to enable the decoder to
attends to its bi-directional context. In addition, the bidirectional decoding
scheme and the characteristic of LST tasks motivate us to share the encoder and
the decoder of seq2seq models. The proposed PBD approach provides right side
context information for the decoder and models the inductive bias of LST tasks,
reducing the number of parameters by half and providing good regularization
effects. Experimental results on several benchmark datasets show that our
approach consistently improves the performance of standard seq2seq models on
LST tasks.
Related papers
- Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding [14.175444025026508]
Large language models (LLMs) have demonstrated remarkable capabilities in tasks requiring chain-of-thought (CoT) prompting.
generating the full CoT process results in significantly longer output sequences, leading to increased computational costs and latency during inference.
We propose a novel approach to compress the CoT process through semantic alignment, enabling more efficient decoding while preserving the benefits of CoT reasoning.
arXiv Detail & Related papers (2024-09-13T06:29:20Z) - FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures [18.17019371324024]
FADE is a novel, plug-and-play, lightweight, and task-agnostic upsampling operator.
We show that FADE is task-agnostic with consistent performance improvement on a number of dense prediction tasks.
For the first time, we demonstrate robust feature upsampling on both region- and detail-sensitive tasks successfully.
arXiv Detail & Related papers (2024-07-18T13:32:36Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - A Framework for Bidirectional Decoding: Case Study in Morphological
Inflection [4.602447284133507]
We propose a framework for decoding sequences from the "outside-in"
At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences.
Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
arXiv Detail & Related papers (2023-05-21T22:08:31Z) - Reducing Sequence Length by Predicting Edit Operations with Large
Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks.
We apply instruction tuning for Large Language Models on the supervision data of edit spans.
Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
Generation [109.46348908829697]
We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence.
We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
arXiv Detail & Related papers (2021-06-29T03:59:21Z) - Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously.
We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder.
Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.