Exact Hard Monotonic Attention for Character-Level Transduction
- URL: http://arxiv.org/abs/1905.06319v3
- Date: Tue, 20 Feb 2024 15:41:14 GMT
- Title: Exact Hard Monotonic Attention for Character-Level Transduction
- Authors: Shijie Wu and Ryan Cotterell
- Abstract summary: We show that neural sequence-to-sequence models that use non-monotonic soft attention often outperform popular monotonic models.
We develop a hard attention sequence-to-sequence model that enforces strict monotonicity and learns a latent alignment jointly while learning to transduce.
- Score: 76.66797368985453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many common character-level, string-to string transduction tasks, e.g.,
grapheme-tophoneme conversion and morphological inflection, consist almost
exclusively of monotonic transductions. However, neural sequence-to sequence
models that use non-monotonic soft attention often outperform popular monotonic
models. In this work, we ask the following question: Is monotonicity really a
helpful inductive bias for these tasks? We develop a hard attention
sequence-to-sequence model that enforces strict monotonicity and learns a
latent alignment jointly while learning to transduce. With the help of dynamic
programming, we are able to compute the exact marginalization over all
monotonic alignments. Our models achieve state-of-the-art performance on
morphological inflection. Furthermore, we find strong performance on two other
character-level transduction tasks. Code is available at
https://github.com/shijie-wu/neural-transducer.
Related papers
- Compositional Generalization without Trees using Multiset Tagging and
Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens.
Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations.
Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Constrained Monotonic Neural Networks [0.685316573653194]
Wider adoption of neural networks in many critical domains such as finance and healthcare is being hindered by the need to explain their predictions.
Monotonicity constraint is one of the most requested properties in real-world scenarios.
We show it can approximate any continuous monotone function on a compact subset of $mathbbRn$.
arXiv Detail & Related papers (2022-05-24T04:26:10Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - On Biasing Transformer Attention Towards Monotonicity [20.205388243570003]
We introduce a monotonicity loss function that is compatible with standard attention mechanisms and test it on several sequence-to-sequence tasks.
Experiments show that we can achieve largely monotonic behavior.
General monotonicity does not benefit transformer multihead attention, however, we see isolated improvements when only a subset of heads is biased towards monotonic behavior.
arXiv Detail & Related papers (2021-04-08T17:42:05Z) - A study of latent monotonic attention variants [65.73442960456013]
End-to-end models reach state-of-the-art performance for speech recognition, but global soft attention is not monotonic.
We present a mathematically clean solution to introduce monotonicity, by introducing a new latent variable.
We show that our monotonic models perform as good as the global soft attention model.
arXiv Detail & Related papers (2021-03-30T22:35:56Z) - Counterexample-Guided Learning of Monotonic Neural Networks [32.73558242733049]
We focus on monotonicity constraints, which are common and require that the function's output increases with increasing values of specific input features.
We develop a counterexample-guided technique to provably enforce monotonicity constraints at prediction time.
We also propose a technique to use monotonicity as an inductive bias for deep learning.
arXiv Detail & Related papers (2020-06-16T01:04:26Z) - Hard Non-Monotonic Attention for Character-Level Transduction [65.17388794270694]
We introduce an exact, exponential-time algorithm for marginalizing over a number of non-monotonic alignments between two strings.
We compare soft and hard non-monotonic attention experimentally and find that the exact algorithm significantly improves performance over the approximation and outperforms soft attention.
arXiv Detail & Related papers (2018-08-29T20:00:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.