SLUA: A Super Lightweight Unsupervised Word Alignment Model via
Cross-Lingual Contrastive Learning
- URL: http://arxiv.org/abs/2102.04009v1
- Date: Mon, 8 Feb 2021 05:54:11 GMT
- Title: SLUA: A Super Lightweight Unsupervised Word Alignment Model via
Cross-Lingual Contrastive Learning
- Authors: Di Wu, Liang Ding, Shuo Yang, Dacheng Tao
- Abstract summary: We propose a super lightweight unsupervised word alignment model (SLUA)
Experimental results on several public benchmarks demonstrate that our model achieves competitive, if not better, performance.
Notably, we recognize our model as a pioneer attempt to unify bilingual word embedding and word alignments.
- Score: 79.91678610678885
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Word alignment is essential for the down-streaming cross-lingual language
understanding and generation tasks. Recently, the performance of the neural
word alignment models has exceeded that of statistical models. However, they
heavily rely on sophisticated translation models. In this study, we propose a
super lightweight unsupervised word alignment (SLUA) model, in which
bidirectional symmetric attention trained with a contrastive learning objective
is introduced, and an agreement loss is employed to bind the attention maps,
such that the alignments follow mirror-like symmetry hypothesis. Experimental
results on several public benchmarks demonstrate that our model achieves
competitive, if not better, performance compared to the state of the art in
word alignment while significantly reducing the training and decoding time on
average. Further ablation analysis and case studies show the superiority of our
proposed SLUA. Notably, we recognize our model as a pioneer attempt to unify
bilingual word embedding and word alignments. Encouragingly, our approach
achieves 16.4x speedup against GIZA++, and 50x parameter compression} compared
with the Transformer-based alignment methods. We will release our code to
facilitate the community.
Related papers
- Revisiting the Superficial Alignment Hypothesis [0.9831489366502302]
The Superficial Alignment Hypothesis posits that almost all of a language model's abilities and knowledge are learned during pre-training.
We re-examine these claims by studying the scaling behavior of post-training with increasing finetuning examples.
arXiv Detail & Related papers (2024-09-27T22:14:10Z) - Bit Cipher -- A Simple yet Powerful Word Representation System that
Integrates Efficiently with Language Models [4.807347156077897]
Bit-cipher is a word representation system that eliminates the need of backpropagation and hyper-efficient dimensionality reduction techniques.
We perform probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess bit-cipher's competitiveness with classic embeddings.
By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima.
arXiv Detail & Related papers (2023-11-18T08:47:35Z) - Dual-Alignment Pre-training for Cross-lingual Sentence Embedding [79.98111074307657]
We propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding.
We introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart.
Our approach can significantly improve sentence embedding.
arXiv Detail & Related papers (2023-05-16T03:53:30Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Improve Transformer Pre-Training with Decoupled Directional Relative
Position Encoding and Representation Differentiations [23.2969212998404]
We revisit the Transformer-based pre-trained language models and identify two problems that may limit the expressiveness of the model.
Existing relative position encoding models confuse two heterogeneous information: relative distance and direction.
We propose two novel techniques to improve pre-trained language models.
arXiv Detail & Related papers (2022-10-09T12:35:04Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Utilizing Language-Image Pretraining for Efficient and Robust Bilingual
Word Alignment [27.405171616881322]
We develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP)
WALIP uses visual observations via the shared embedding space of images and texts provided by CLIP models.
Our experiments show that WALIP improves upon the state-of-the-art performance of bilingual word alignment for a few language pairs.
arXiv Detail & Related papers (2022-05-23T20:29:26Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.