A study of latent monotonic attention variants
- URL: http://arxiv.org/abs/2103.16710v1
- Date: Tue, 30 Mar 2021 22:35:56 GMT
- Title: A study of latent monotonic attention variants
- Authors: Albert Zeyer, Ralf Schl\"uter, Hermann Ney
- Abstract summary: End-to-end models reach state-of-the-art performance for speech recognition, but global soft attention is not monotonic.
We present a mathematically clean solution to introduce monotonicity, by introducing a new latent variable.
We show that our monotonic models perform as good as the global soft attention model.
- Score: 65.73442960456013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end models reach state-of-the-art performance for speech recognition,
but global soft attention is not monotonic, which might lead to convergence
problems, to instability, to bad generalisation, cannot be used for online
streaming, and is also inefficient in calculation. Monotonicity can potentially
fix all of this. There are several ad-hoc solutions or heuristics to introduce
monotonicity, but a principled introduction is rarely found in literature so
far. In this paper, we present a mathematically clean solution to introduce
monotonicity, by introducing a new latent variable which represents the audio
position or segment boundaries. We compare several monotonic latent models to
our global soft attention baseline such as a hard attention model, a local
windowed soft attention model, and a segmental soft attention model. We can
show that our monotonic models perform as good as the global soft attention
model. We perform our experiments on Switchboard 300h. We carefully outline the
details of our training and release our code and configs.
Related papers
- Towards a Generalist and Blind RGB-X Tracker [91.36268768952755]
We develop a single model tracker that can remain blind to any modality X during inference time.
Our training process is extremely simple, integrating multi-label classification loss with a routing function.
Our generalist and blind tracker can achieve competitive performance compared to well-established modal-specific models.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - How to address monotonicity for model risk management? [1.0878040851638]
This paper studies transparent neural networks in the presence of three types of monotonicity: individual monotonicity, weak pairwise monotonicity, and strong pairwise monotonicity.
As a means of achieving monotonicity while maintaining transparency, we propose the monotonic groves of neural additive models.
arXiv Detail & Related papers (2023-04-28T04:21:02Z) - Monotonic segmental attention for automatic speech recognition [45.036436385637295]
We introduce a novel segmental-attention model for automatic speech recognition.
We compare global-attention and different segmental-attention modeling variants.
We observe that the segmental model generalizes much better to long sequences of up to several minutes.
arXiv Detail & Related papers (2022-10-26T14:21:23Z) - Attention Bottlenecks for Multimodal Fusion [90.75885715478054]
Machine perception models are typically modality-specific and optimised for unimodal benchmarks.
We introduce a novel transformer based architecture that uses fusion' for modality fusion at multiple layers.
We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks.
arXiv Detail & Related papers (2021-06-30T22:44:12Z) - Certified Monotonic Neural Networks [15.537695725617576]
We propose to certify the monotonicity of the general piece-wise linear neural networks by solving a mixed integer linear programming problem.
Our approach does not require human-designed constraints on the weight space and also yields more accurate approximation.
arXiv Detail & Related papers (2020-11-20T04:58:13Z) - Bayesian Attention Modules [65.52970388117923]
We propose a scalable version of attention that is easy to implement and optimize.
Our experiments show the proposed method brings consistent improvements over the corresponding baselines.
arXiv Detail & Related papers (2020-10-20T20:30:55Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z) - Exact Hard Monotonic Attention for Character-Level Transduction [76.66797368985453]
We show that neural sequence-to-sequence models that use non-monotonic soft attention often outperform popular monotonic models.
We develop a hard attention sequence-to-sequence model that enforces strict monotonicity and learns a latent alignment jointly while learning to transduce.
arXiv Detail & Related papers (2019-05-15T17:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.