Leveraging Neural Machine Translation for Word Alignment
- URL: http://arxiv.org/abs/2103.17250v1
- Date: Wed, 31 Mar 2021 17:51:35 GMT
- Title: Leveraging Neural Machine Translation for Word Alignment
- Authors: Vil\'em Zouhar and Daria Pylypenko
- Abstract summary: A machine translation (MT) system is able to produce word-alignments using the trained attention heads.
This is convenient because word-alignment is theoretically a viable byproduct of any attention-based NMT.
We summarize different approaches on how word-alignment can be extracted from alignment scores and then explore ways in which scores can be extracted from NMT.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The most common tools for word-alignment rely on a large amount of parallel
sentences, which are then usually processed according to one of the IBM model
algorithms. The training data is, however, the same as for machine translation
(MT) systems, especially for neural MT (NMT), which itself is able to produce
word-alignments using the trained attention heads. This is convenient because
word-alignment is theoretically a viable byproduct of any attention-based NMT,
which is also able to provide decoder scores for a translated sentence pair.
We summarize different approaches on how word-alignment can be extracted from
alignment scores and then explore ways in which scores can be extracted from
NMT, focusing on inferring the word-alignment scores based on output sentence
and token probabilities. We compare this to the extraction of alignment scores
from attention. We conclude with aggregating all of the sources of alignment
scores into a simple feed-forward network which achieves the best results when
combined alignment extractors are used.
Related papers
- Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine
Translation of Lecture Transcripts [50.00305136008848]
We propose a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera.
For both English--Japanese and English--Chinese lecture translations, we extracted parallel corpora of approximately 50,000 lines and created development and test sets.
This study also suggests guidelines for gathering and cleaning corpora, mining parallel sentences, cleaning noise in the mined data, and creating high-quality evaluation splits.
arXiv Detail & Related papers (2023-11-07T03:50:25Z) - Iterative pseudo-forced alignment by acoustic CTC loss for
self-supervised ASR domain adaptation [80.12316877964558]
High-quality data labeling from specific domains is costly and human time-consuming.
We propose a self-supervised domain adaptation method, based upon an iterative pseudo-forced alignment algorithm.
arXiv Detail & Related papers (2022-10-27T07:23:08Z) - Graph Neural Networks for Multiparallel Word Alignment [0.27998963147546146]
We compute high-quality word alignments between multiple language pairs by considering all language pairs together.
We use graph neural networks (GNNs) to exploit the graph structure.
Our method outperforms previous work on three word-alignment datasets and on a downstream task.
arXiv Detail & Related papers (2022-03-16T14:41:35Z) - Graph Algorithms for Multiparallel Word Alignment [2.5200727733264663]
In this work, we exploit the multiparallelity of corpora by representing an initial set of bilingual alignments as a graph.
We present two graph algorithms for edge prediction: one inspired by recommender systems and one based on network link prediction.
arXiv Detail & Related papers (2021-09-13T19:40:29Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Adaptive Nearest Neighbor Machine Translation [60.97183408140499]
kNN-MT combines pre-trained neural machine translation with token-level k-nearest-neighbor retrieval.
Traditional kNN algorithm simply retrieves a same number of nearest neighbors for each target token.
We propose Adaptive kNN-MT to dynamically determine the number of k for each target token.
arXiv Detail & Related papers (2021-05-27T09:27:42Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - Rationalizing Text Matching: Learning Sparse Alignments via Optimal
Transport [14.86310501896212]
In this work, we extend this selective rationalization approach to text matching.
The goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction.
Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs.
arXiv Detail & Related papers (2020-05-27T01:20:49Z) - Accurate Word Alignment Induction from Neural Machine Translation [33.21196289328584]
We propose two novel word alignment induction methods Shift-Att and Shift-AET.
The main idea is to induce alignments at the step when the to-be-aligned target token is the decoder input.
Experiments on three publicly available datasets demonstrate that both methods perform better than their corresponding neural baselines.
arXiv Detail & Related papers (2020-04-30T14:47:05Z) - SimAlign: High Quality Word Alignments without Parallel Training Data
using Static and Contextualized Embeddings [3.8424737607413153]
We propose word alignment methods that require no parallel data.
Key idea is to leverage multilingual word embeddings, both static and contextualized, for word alignment.
We find that alignments created from embeddings are superior for two language pairs compared to those produced by traditional statistical methods.
arXiv Detail & Related papers (2020-04-18T23:10:36Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.