Exploiting Neural Query Translation into Cross Lingual Information
Retrieval
- URL: http://arxiv.org/abs/2010.13659v1
- Date: Mon, 26 Oct 2020 15:28:19 GMT
- Title: Exploiting Neural Query Translation into Cross Lingual Information
Retrieval
- Authors: Liang Yao and Baosong Yang and Haibo Zhang and Weihua Luo and Boxing
Chen
- Abstract summary: Existing CLIR systems mainly exploit statistical-based machine translation (SMT) rather than the advanced neural machine translation (NMT)
We propose a novel data augmentation method that extracts query translation pairs according to user clickthrough data.
Experimental results reveal that the proposed approach yields better retrieval quality than strong baselines.
- Score: 49.167049709403166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a crucial role in cross-language information retrieval (CLIR), query
translation has three main challenges: 1) the adequacy of translation; 2) the
lack of in-domain parallel training data; and 3) the requisite of low latency.
To this end, existing CLIR systems mainly exploit statistical-based machine
translation (SMT) rather than the advanced neural machine translation (NMT),
limiting the further improvements on both translation and retrieval quality. In
this paper, we investigate how to exploit neural query translation model into
CLIR system. Specifically, we propose a novel data augmentation method that
extracts query translation pairs according to user clickthrough data, thus to
alleviate the problem of domain-adaptation in NMT. Then, we introduce an
asynchronous strategy which is able to leverage the advantages of the real-time
in SMT and the veracity in NMT. Experimental results reveal that the proposed
approach yields better retrieval quality than strong baselines and can be well
applied into a real-world CLIR system, i.e. Aliexpress e-Commerce search
engine. Readers can examine and test their cases on our website:
https://aliexpress.com .
Related papers
- LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation [43.26446958873554]
Large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
Recent advancements in large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
LandeRMT is a framework that selectively finetunes LLMs to textbfMachine textbfTranslation with diverse translation training data.
arXiv Detail & Related papers (2024-09-29T02:39:42Z) - LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries.
Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z) - Textual Augmentation Techniques Applied to Low Resource Machine
Translation: Case of Swahili [1.9686054517684888]
In machine translation, majority of the language pairs around the world are considered low resource because they have little parallel data available.
We study and apply three simple data augmentation techniques popularly used in text classification tasks.
We see that there is potential to use these methods in neural machine translation when more extensive experiments are done with diverse datasets.
arXiv Detail & Related papers (2023-06-12T20:43:24Z) - Exploiting Language Relatedness in Machine Translation Through Domain
Adaptation Techniques [3.257358540764261]
We present a novel approach of using a scaled similarity score of sentences, especially for related languages based on a 5-gram KenLM language model.
Our approach succeeds in increasing 2 BLEU point on multi-domain approach, 3 BLEU point on fine-tuning for NMT and 2 BLEU point on iterative back-translation approach.
arXiv Detail & Related papers (2023-03-03T09:07:30Z) - A Survey on Low-Resource Neural Machine Translation [106.51056217748388]
We classify related works into three categories according to the auxiliary data they used.
We hope that our survey can help researchers to better understand this field and inspire them to design better algorithms.
arXiv Detail & Related papers (2021-07-09T06:26:38Z) - Constraint Translation Candidates: A Bridge between Neural Query
Translation and Cross-lingual Information Retrieval [45.88734029123836]
We propose a novel approach to alleviate problems by limiting the open target vocabulary search space of QT to a set of important words mined from search index database.
The proposed methods are exploited and examined in a real-word CLIR system--Aliexpress e-Commerce search engine.
arXiv Detail & Related papers (2020-10-26T15:27:51Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.