Constraint Translation Candidates: A Bridge between Neural Query
Translation and Cross-lingual Information Retrieval
- URL: http://arxiv.org/abs/2010.13658v1
- Date: Mon, 26 Oct 2020 15:27:51 GMT
- Title: Constraint Translation Candidates: A Bridge between Neural Query
Translation and Cross-lingual Information Retrieval
- Authors: Tianchi Bi and Liang Yao and Baosong Yang and Haibo Zhang and Weihua
Luo and Boxing Chen
- Abstract summary: We propose a novel approach to alleviate problems by limiting the open target vocabulary search space of QT to a set of important words mined from search index database.
The proposed methods are exploited and examined in a real-word CLIR system--Aliexpress e-Commerce search engine.
- Score: 45.88734029123836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query translation (QT) is a key component in cross-lingual information
retrieval system (CLIR). With the help of deep learning, neural machine
translation (NMT) has shown promising results on various tasks. However, NMT is
generally trained with large-scale out-of-domain data rather than in-domain
query translation pairs. Besides, the translation model lacks a mechanism at
the inference time to guarantee the generated words to match the search index.
The two shortages of QT result in readable texts for human but inadequate
candidates for the downstream retrieval task. In this paper, we propose a novel
approach to alleviate these problems by limiting the open target vocabulary
search space of QT to a set of important words mined from search index
database. The constraint translation candidates are employed at both of
training and inference time, thus guiding the translation model to learn and
generate well performing target queries. The proposed methods are exploited and
examined in a real-word CLIR system--Aliexpress e-Commerce search engine.
Experimental results demonstrate that our approach yields better performance on
both translation quality and retrieval accuracy than the strong NMT baseline.
Related papers
- A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations [0.4499833362998489]
This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy.
To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations.
Results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT.
arXiv Detail & Related papers (2024-09-04T13:49:45Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - An approach for mistranslation removal from popular dataset for Indic MT
Task [5.4755933832880865]
We propose an algorithm to remove mistranslations from the training corpus and evaluate its performance and efficiency.
Two Indic languages (ILs), namely, Hindi (HIN) and Odia (ODI) are chosen for the experiment.
The quality of the translations in the experiment is evaluated using standard metrics such as BLEU, METEOR, and RIBES.
arXiv Detail & Related papers (2024-01-12T06:37:19Z) - Towards Faster k-Nearest-Neighbor Machine Translation [56.66038663128903]
k-nearest-neighbor machine translation approaches suffer from heavy retrieve overhead on the entire datastore when decoding each token.
We propose a simple yet effective multi-layer perceptron (MLP) network to predict whether a token should be translated jointly by the neural machine translation model and probabilities produced by the kNN.
arXiv Detail & Related papers (2023-12-12T16:41:29Z) - Bridging the Domain Gaps in Context Representations for k-Nearest
Neighbor Neural Machine Translation [57.49095610777317]
$k$-Nearest neighbor machine translation ($k$NN-MT) has attracted increasing attention due to its ability to non-parametrically adapt to new translation domains.
We propose a novel approach to boost the datastore retrieval of $k$NN-MT by reconstructing the original datastore.
Our method can effectively boost the datastore retrieval and translation quality of $k$NN-MT.
arXiv Detail & Related papers (2023-05-26T03:04:42Z) - Exploiting Curriculum Learning in Unsupervised Neural Machine
Translation [28.75229367700697]
We propose a curriculum learning method to gradually utilize pseudo bi-texts based on their quality from multiple granularities.
Experimental results on WMT 14 En-Fr, WMT 16 En-De, WMT 16 En-Ro, and LDC En-Zh translation tasks demonstrate that the proposed method achieves consistent improvements with faster convergence speed.
arXiv Detail & Related papers (2021-09-23T07:18:06Z) - Exploiting Neural Query Translation into Cross Lingual Information
Retrieval [49.167049709403166]
Existing CLIR systems mainly exploit statistical-based machine translation (SMT) rather than the advanced neural machine translation (NMT)
We propose a novel data augmentation method that extracts query translation pairs according to user clickthrough data.
Experimental results reveal that the proposed approach yields better retrieval quality than strong baselines.
arXiv Detail & Related papers (2020-10-26T15:28:19Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.