Related papers: Pseudo-Bidirectional Decoding for Local Sequence Transduction

Pseudo-Bidirectional Decoding for Local Sequence Transduction

URL: http://arxiv.org/abs/2001.11694v3
Date: Sun, 1 Nov 2020 16:01:31 GMT
Title: Pseudo-Bidirectional Decoding for Local Sequence Transduction
Authors: Wangchunshu Zhou, Tao Ge, Ke Xu
Abstract summary: We propose a simple but versatile approach named Pseudo-Bidirectional Decoding (PBD) for LST tasks. The proposed PBD approach provides right side context information for the decoder and models the inductive bias of LST tasks. Experimental results on several benchmark datasets show that our approach consistently improves the performance of standard seq2seq models on LST tasks.
Score: 31.05704333618685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Local sequence transduction (LST) tasks are sequence transduction tasks where there exists massive overlapping between the source and target sequences, such as Grammatical Error Correction (GEC) and spell or OCR correction. Previous work generally tackles LST tasks with standard sequence-to-sequence (seq2seq) models that generate output tokens from left to right and suffer from the issue of unbalanced outputs. Motivated by the characteristic of LST tasks, in this paper, we propose a simple but versatile approach named Pseudo-Bidirectional Decoding (PBD) for LST tasks. PBD copies the corresponding representation of source tokens to the decoder as pseudo future context to enable the decoder to attends to its bi-directional context. In addition, the bidirectional decoding scheme and the characteristic of LST tasks motivate us to share the encoder and the decoder of seq2seq models. The proposed PBD approach provides right side context information for the decoder and models the inductive bias of LST tasks, reducing the number of parameters by half and providing good regularization effects. Experimental results on several benchmark datasets show that our approach consistently improves the performance of standard seq2seq models on LST tasks.

Related papers

DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs [0.0]
We show that large language models (LLMs) exhibit low confidence in regions of structural ambiguity or semantic complexity.<n>We introduce DecoRTL, a novel run-time decoding strategy, that is both syntax-aware and contrastive for RTL code generation.<n>Our approach operates entirely at inference time without requiring any additional model fine-tuning.
arXiv Detail & Related papers (2025-07-03T01:17:44Z)
Threshold Selection for Iterative Decoding of $(v,w)$-regular Binary Codes [84.0257274213152]
Iterative bit flipping decoders are an efficient choice for sparse $(v,w)$-regular codes. We propose concrete criteria for threshold determination, backed by a closed form model.
arXiv Detail & Related papers (2025-01-23T17:38:22Z)
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding [14.175444025026508]
Large language models (LLMs) have demonstrated remarkable capabilities in tasks requiring chain-of-thought (CoT) prompting. generating the full CoT process results in significantly longer output sequences, leading to increased computational costs and latency during inference. We propose a novel approach to compress the CoT process through semantic alignment, enabling more efficient decoding while preserving the benefits of CoT reasoning.
arXiv Detail & Related papers (2024-09-13T06:29:20Z)
FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures [18.17019371324024]
FADE is a novel, plug-and-play, lightweight, and task-agnostic upsampling operator. We show that FADE is task-agnostic with consistent performance improvement on a number of dense prediction tasks. For the first time, we demonstrate robust feature upsampling on both region- and detail-sensitive tasks successfully.
arXiv Detail & Related papers (2024-07-18T13:32:36Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
A Framework for Bidirectional Decoding: Case Study in Morphological Inflection [4.602447284133507]
We propose a framework for decoding sequences from the "outside-in" At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences. Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
arXiv Detail & Related papers (2023-05-21T22:08:31Z)
Reducing Sequence Length by Predicting Edit Operations with Large Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks. We apply instruction tuning for Large Language Models on the supervision data of edit spans. Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z)
Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side. By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample. We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z)
Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation [109.46348908829697]
We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
arXiv Detail & Related papers (2021-06-29T03:59:21Z)
Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously. We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder. Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z)
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty. We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.