Mixed Attention Transformer for LeveragingWord-Level Knowledge to Neural
Cross-Lingual Information Retrieval
- URL: http://arxiv.org/abs/2109.02789v1
- Date: Tue, 7 Sep 2021 00:33:14 GMT
- Title: Mixed Attention Transformer for LeveragingWord-Level Knowledge to Neural
Cross-Lingual Information Retrieval
- Authors: Zhiqi Huang, Hamed Bonab, Sheikh Muhammad Sarwar, Razieh Rahimi, and
James Allan
- Abstract summary: We propose a novel Mixed Attention Transformer (MAT) that incorporates external word level knowledge, such as a dictionary or translation table.
By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence.
- Score: 15.902630454568811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained contextualized representations offer great success for many
downstream tasks, including document ranking. The multilingual versions of such
pretrained representations provide a possibility of jointly learning many
languages with the same model. Although it is expected to gain big with such
joint training, in the case of cross lingual information retrieval (CLIR), the
models under a multilingual setting are not achieving the same level of
performance as those under a monolingual setting. We hypothesize that the
performance drop is due to the translation gap between query and documents. In
the monolingual retrieval task, because of the same lexical inputs, it is
easier for model to identify the query terms that occurred in documents.
However, in the multilingual pretrained models that the words in different
languages are projected into the same hyperspace, the model tends to translate
query terms into related terms, i.e., terms that appear in a similar context,
in addition to or sometimes rather than synonyms in the target language. This
property is creating difficulties for the model to connect terms that cooccur
in both query and document. To address this issue, we propose a novel Mixed
Attention Transformer (MAT) that incorporates external word level knowledge,
such as a dictionary or translation table. We design a sandwich like
architecture to embed MAT into the recent transformer based deep neural models.
By encoding the translation knowledge into an attention matrix, the model with
MAT is able to focus on the mutually translated words in the input sequence.
Experimental results demonstrate the effectiveness of the external knowledge
and the significant improvement of MAT embedded neural reranking model on CLIR
task.
Related papers
- Contextual Code Switching for Machine Translation using Language Models [1.4866655830571935]
Large language models (LLMs) have exerted a considerable impact on diverse language-related tasks in recent years.
We present an extensive study on the code switching task specifically for the machine translation task comparing multiple LLMs.
Our results indicate that despite the LLMs having promising results in the certain tasks, the models with relatively lesser complexity outperform the multilingual large language models in the machine translation task.
arXiv Detail & Related papers (2023-12-20T16:40:33Z) - Assessing Linguistic Generalisation in Language Models: A Dataset for
Brazilian Portuguese [4.941630596191806]
We propose a set of intrinsic evaluation tasks that inspect the linguistic information encoded in models developed for Brazilian Portuguese.
These tasks are designed to evaluate how different language models generalise information related to grammatical structures and multiword expressions.
arXiv Detail & Related papers (2023-05-23T13:49:14Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.