Cross-Lingual Phrase Retrieval
- URL: http://arxiv.org/abs/2204.08887v1
- Date: Tue, 19 Apr 2022 13:35:50 GMT
- Title: Cross-Lingual Phrase Retrieval
- Authors: Heqi Zheng, Xiao Zhang, Zewen Chi, Heyan Huang, Tan Yan, Tian Lan, Wei
Wei, Xian-Ling Mao
- Abstract summary: Cross-lingual retrieval aims to retrieve relevant text across languages.
Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level.
We propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences.
- Score: 49.919180978902915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-lingual retrieval aims to retrieve relevant text across languages.
Current methods typically achieve cross-lingual retrieval by learning
language-agnostic text representations in word or sentence level. However, how
to learn phrase representations for cross-lingual phrase retrieval is still an
open problem. In this paper, we propose XPR, a cross-lingual phrase retriever
that extracts phrase representations from unlabeled example sentences.
Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which
contains 65K bilingual phrase pairs and 4.2M example sentences in 8
English-centric language pairs. Experimental results show that XPR outperforms
state-of-the-art baselines which utilize word-level or sentence-level
representations. XPR also shows impressive zero-shot transferability that
enables the model to perform retrieval in an unseen language pair during
training. Our dataset, code, and trained models are publicly available at
www.github.com/cwszz/XPR/.
Related papers
- XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples [64.79218405438871]
We introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the challenge of cross-lingual in-context learning.
XAMPLER first trains a retriever based on Glot500, a multilingual small language model, using positive and negative English examples.
It can directly retrieve English examples as few-shot examples for in-context learning of target languages.
arXiv Detail & Related papers (2024-05-08T15:13:33Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z) - XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for
Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks.
This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query.
We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z) - Multilingual Representation Distillation with Contrastive Learning [20.715534360712425]
We integrate contrastive learning into multilingual representation distillation and use it for quality estimation of parallel sentences.
We validate our approach with multilingual similarity search and corpus filtering tasks.
arXiv Detail & Related papers (2022-10-10T22:27:04Z) - Retrieval-Augmented Multilingual Keyphrase Generation with
Retriever-Generator Iterative Training [66.64843711515341]
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text.
We call attention to a new setting named multilingual keyphrase generation.
We propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages.
arXiv Detail & Related papers (2022-05-21T00:45:21Z) - Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text
Representations Without Parallel Corpora [19.02834713111249]
Backretrieval is shown to correlate with ground truth metrics on annotated datasets.
Our experiments conclude with a case study on a recipe dataset without parallel cross-lingual data.
arXiv Detail & Related papers (2021-05-11T12:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.