Related papers: Overcoming Non-monotonicity in Transducer-based Streaming Generation

Overcoming Non-monotonicity in Transducer-based Streaming Generation

URL: http://arxiv.org/abs/2411.17170v2
Date: Wed, 28 May 2025 11:20:49 GMT
Title: Overcoming Non-monotonicity in Transducer-based Streaming Generation
Authors: Zhengrui Ma, Yang Feng, Min Zhang,
Abstract summary: This research integrates Transducer's decoding with the history of input stream via a learnable monotonic attention.<n>Our approach leverages the forward-backward algorithm to infer the posterior probability of alignments between the predictor states and input timestamps.<n>Experiments show that our MonoAttn-Transducer effectively handles non-monotonic alignments in streaming scenarios.
Score: 26.24357071901915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Streaming generation models are utilized across fields, with the Transducer architecture being popular in industrial applications. However, its input-synchronous decoding mechanism presents challenges in tasks requiring non-monotonic alignments, such as simultaneous translation. In this research, we address this issue by integrating Transducer's decoding with the history of input stream via a learnable monotonic attention. Our approach leverages the forward-backward algorithm to infer the posterior probability of alignments between the predictor states and input timestamps, which is then used to estimate the monotonic context representations, thereby avoiding the need to enumerate the exponentially large alignment space during training. Extensive experiments show that our MonoAttn-Transducer effectively handles non-monotonic alignments in streaming scenarios, offering a robust solution for complex generation tasks.

Related papers

Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences [5.244482076690776]
We find that expressive capability of sequence representation is a key factor influencing Transformer performance in time forecasting. We propose a novel attention mechanism with Sequence Complementors and prove feasible from an information theory perspective.
arXiv Detail & Related papers (2025-01-06T03:08:39Z)
LinFormer: A Linear-based Lightweight Transformer Architecture For Time-Aware MIMO Channel Prediction [39.12741712294741]
6th generation (6G) mobile networks bring new challenges in supporting high-mobility communications. We present LinFormer, an innovative channel prediction framework based on a scalable, all-linear, encoder-only Transformer model. Our approach achieves a substantial reduction in computational complexity while maintaining high prediction accuracy, making it more suitable for deployment in cost-effective base stations (BS)
arXiv Detail & Related papers (2024-10-28T13:04:23Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators [83.48423407316713]
We present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately. Our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail. Our method achieves a state-of-the-art FID score of 2.01 when integrated with the recent work SiT.
arXiv Detail & Related papers (2024-08-11T07:01:39Z)
Rough Transformers: Lightweight Continuous-Time Sequence Modelling with Path Signatures [46.58170057001437]
We introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences. We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts.
arXiv Detail & Related papers (2024-05-31T14:00:44Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data [8.660721666999718]
We propose a hybrid pipeline composed of asynchronous sensing and synchronous processing. We achieve performances state-of-the-art with a lower latency than competitors.
arXiv Detail & Related papers (2024-02-02T13:17:19Z)
Learning Stationary Markov Processes with Contrastive Adjustment [2.76240219662896]
We introduce a new optimization algorithm, termed emphcontrastive adjustment, for learning Markov transition kernels. Contrastive adjustment is not restricted to a particular family of transition distributions and can be used to model data in both continuous and discrete state spaces. We show that contrastive adjustment is highly valuable in human-computer design processes.
arXiv Detail & Related papers (2023-03-09T18:50:15Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths. We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately. The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z)
CCVS: Context-aware Controllable Video Synthesis [95.22008742695772]
presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones. It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control.
arXiv Detail & Related papers (2021-07-16T17:57:44Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
Streaming Simultaneous Speech Translation with Augmented Memory Transformer [29.248366441276662]
Transformer-based models have achieved state-of-the-art performance on speech translation tasks. We propose an end-to-end transformer-based sequence-to-sequence model, equipped with an augmented memory transformer encoder.
arXiv Detail & Related papers (2020-10-30T18:28:42Z)
Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z)
A Probabilistic Formulation of Unsupervised Text Style Transfer [128.80213211598752]
We present a deep generative model for unsupervised text style transfer that unifies previously proposed non-generative techniques. By hypothesizing a parallel latent sequence that generates each observed sequence, our model learns to transform sequences from one domain to another in a completely unsupervised fashion.
arXiv Detail & Related papers (2020-02-10T16:20:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.