Mask-Align: Self-Supervised Neural Word Alignment
- URL: http://arxiv.org/abs/2012.07162v1
- Date: Sun, 13 Dec 2020 21:44:29 GMT
- Title: Mask-Align: Self-Supervised Neural Word Alignment
- Authors: Chi Chen, Maosong Sun, and Yang Liu
- Abstract summary: Mask-Align is a self-supervised model specifically designed for the word alignment task.
Our model parallelly masks and predicts each target token, and extracts high-quality alignments without any supervised loss.
- Score: 47.016975106231875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural word alignment methods have received increasing attention recently.
These methods usually extract word alignment from a machine translation model.
However, there is a gap between translation and alignment tasks, since the
target future context is available in the latter. In this paper, we propose
Mask-Align, a self-supervised model specifically designed for the word
alignment task. Our model parallelly masks and predicts each target token, and
extracts high-quality alignments without any supervised loss. In addition, we
introduce leaky attention to alleviate the problem of unexpected high attention
weights on special tokens. Experiments on four language pairs show that our
model significantly outperforms all existing unsupervised neural baselines and
obtains new state-of-the-art results.
Related papers
- Beyond Image-Text Matching: Verb Understanding in Multimodal
Transformers Using Guided Masking [0.4543820534430524]
This work introduces an alternative probing strategy called guided masking.
The proposed approach ablates different modalities using masking and assesses the model's ability to predict the masked word with high accuracy.
We show that guided masking on ViLBERT, LXMERT, UNITER, and VisualBERT can predict the correct verb with high accuracy.
arXiv Detail & Related papers (2024-01-29T21:22:23Z) - Single-Stream Multi-Level Alignment for Vision-Language Pretraining [103.09776737512078]
We propose a single stream model that aligns the modalities at multiple levels.
We achieve this using two novel tasks: symmetric cross-modality reconstruction and a pseudo-labeled key word prediction.
We demonstrate top performance on a set of Vision-Language downstream tasks such as zero-shot/fine-tuned image/text retrieval, referring expression, and VQA.
arXiv Detail & Related papers (2022-03-27T21:16:10Z) - Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word
Alignment [49.45399359826453]
Cross-lingual language models are typically pretrained with language modeling on multilingual text or parallel sentences.
We introduce denoising word alignment as a new cross-lingual pre-training task.
Experimental results show that our method improves cross-lingual transferability on various datasets.
arXiv Detail & Related papers (2021-06-11T13:36:01Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - SLUA: A Super Lightweight Unsupervised Word Alignment Model via
Cross-Lingual Contrastive Learning [79.91678610678885]
We propose a super lightweight unsupervised word alignment model (SLUA)
Experimental results on several public benchmarks demonstrate that our model achieves competitive, if not better, performance.
Notably, we recognize our model as a pioneer attempt to unify bilingual word embedding and word alignments.
arXiv Detail & Related papers (2021-02-08T05:54:11Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - Generative Pre-training for Paraphrase Generation by Representing and
Predicting Spans in Exemplars [0.8411385346896411]
This paper presents a novel approach to paraphrasing sentences, extended from the GPT-2 model.
We develop a template masking technique, named first-order masking, to masked out irrelevant words in exemplars utilizing POS taggers.
Our proposed approach outperforms competitive baselines, especially in the semantic preservation aspect.
arXiv Detail & Related papers (2020-11-29T11:36:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.