Learning Monotonic Attention in Transducer for Streaming Generation
- URL: http://arxiv.org/abs/2411.17170v1
- Date: Tue, 26 Nov 2024 07:19:26 GMT
- Title: Learning Monotonic Attention in Transducer for Streaming Generation
- Authors: Zhengrui Ma, Yang Feng, Min Zhang,
- Abstract summary: We propose a learnable monotonic attention mechanism to handle non-monotonic alignments in Transducer-based streaming generation models.
Our approach allows Transducer models to adaptively adjust the scope of attention based on their predictions, avoiding the need to enumerate the exponentially large alignment space.
- Score: 26.24357071901915
- License:
- Abstract: Streaming generation models are increasingly utilized across various fields, with the Transducer architecture being particularly popular in industrial applications. However, its input-synchronous decoding mechanism presents challenges in tasks requiring non-monotonic alignments, such as simultaneous translation, leading to suboptimal performance in these contexts. In this research, we address this issue by tightly integrating Transducer's decoding with the history of input stream via a learnable monotonic attention mechanism. Our approach leverages the forward-backward algorithm to infer the posterior probability of alignments between the predictor states and input timestamps, which is then used to estimate the context representations of monotonic attention in training. This allows Transducer models to adaptively adjust the scope of attention based on their predictions, avoiding the need to enumerate the exponentially large alignment space. Extensive experiments demonstrate that our MonoAttn-Transducer significantly enhances the handling of non-monotonic alignments in streaming generation, offering a robust solution for Transducer-based frameworks to tackle more complex streaming generation tasks.
Related papers
- Sequence Complementor: Complementing Transformers For Time Series Forecasting with Learnable Sequences [5.244482076690776]
We find that expressive capability of sequence representation is a key factor influencing Transformer performance in time forecasting.
We propose a novel attention mechanism with Sequence Complementors and prove feasible from an information theory perspective.
arXiv Detail & Related papers (2025-01-06T03:08:39Z) - PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators [83.48423407316713]
We present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.
Our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail.
Our method achieves a state-of-the-art FID score of 2.01 when integrated with the recent work SiT.
arXiv Detail & Related papers (2024-08-11T07:01:39Z) - Rough Transformers: Lightweight and Continuous Time Series Modelling through Signature Patching [46.58170057001437]
We introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences.
We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts.
arXiv Detail & Related papers (2024-05-31T14:00:44Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths.
We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately.
The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z) - CCVS: Context-aware Controllable Video Synthesis [95.22008742695772]
presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones.
It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control.
arXiv Detail & Related papers (2021-07-16T17:57:44Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.