Multistage BiCross Encoder: Team GATE Entry for MLIA Multilingual
Semantic Search Task 2
- URL: http://arxiv.org/abs/2101.03013v2
- Date: Fri, 15 Jan 2021 20:38:23 GMT
- Title: Multistage BiCross Encoder: Team GATE Entry for MLIA Multilingual
Semantic Search Task 2
- Authors: Iknoor Singh, Carolina Scarton, Kalina Bontcheva
- Abstract summary: We present a search system called Multistage BiCross, developed by team GATE for the MLIA task 2 Multilingual Semantic Search.
The results of round 1 show that our models achieve state-of-the-art performance for all ranking metrics for both monolingual and bilingual runs.
- Score: 6.229830820553111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Coronavirus (COVID-19) pandemic has led to a rapidly growing `infodemic'
online. Thus, the accurate retrieval of reliable relevant data from millions of
documents about COVID-19 has become urgently needed for the general public as
well as for other stakeholders. The COVID-19 Multilingual Information Access
(MLIA) initiative is a joint effort to ameliorate exchange of COVID-19 related
information by developing applications and services through research and
community participation. In this work, we present a search system called
Multistage BiCross Encoder, developed by team GATE for the MLIA task 2
Multilingual Semantic Search. Multistage BiCross-Encoder is a sequential three
stage pipeline which uses the Okapi BM25 algorithm and a transformer based
bi-encoder and cross-encoder to effectively rank the documents with respect to
the query. The results of round 1 show that our models achieve state-of-the-art
performance for all ranking metrics for both monolingual and bilingual runs.
Related papers
- Improved Cross-Lingual Transfer Learning For Automatic Speech
Translation [18.97234151624098]
We show that by initializing the encoder of the encoder-decoder sequence-to-sequence translation model with SAMU-XLS-R, we achieve significantly better cross-lingual task knowledge transfer.
We demonstrate the effectiveness of our approach on two popular datasets, namely, CoVoST-2 and Europarl.
arXiv Detail & Related papers (2023-06-01T15:19:06Z) - Simple Yet Effective Neural Ranking and Reranking Baselines for
Cross-Lingual Information Retrieval [50.882816288076725]
Cross-lingual information retrieval is the task of searching documents in one language with queries in another.
We provide a conceptual framework for organizing different approaches to cross-lingual retrieval using multi-stage architectures for mono-lingual retrieval as a scaffold.
We implement simple yet effective reproducible baselines in the Anserini and Pyserini IR toolkits for test collections from the TREC 2022 NeuCLIR Track, in Persian, Russian, and Chinese.
arXiv Detail & Related papers (2023-04-03T14:17:00Z) - MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question
Answering for 16 Diverse Languages [54.002969723086075]
We evaluate cross-lingual open-retrieval question answering systems in 16 typologically diverse languages.
The best system leveraging iteratively mined diverse negative examples achieves 32.2 F1, outperforming our baseline by 4.5 points.
The second best system uses entity-aware contextualized representations for document retrieval, and achieves significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly zero scores.
arXiv Detail & Related papers (2022-07-02T06:54:10Z) - Multilingual Event Linking to Wikidata [5.726712522440283]
We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English.
We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata.
arXiv Detail & Related papers (2022-04-13T17:28:23Z) - ESPnet-ST IWSLT 2021 Offline Speech Translation System [56.83606198051871]
This paper describes the ESPnet-ST group's IWSLT 2021 submission in the offline speech translation track.
This year we made various efforts on training data, architecture, and audio segmentation.
Our best E2E system combined all the techniques with model ensembling and achieved 31.4 BLEU.
arXiv Detail & Related papers (2021-07-01T17:49:43Z) - MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual
Word-in-Context Disambiguation using Augmented Data, Signals, and
Transformers [1.869621561196521]
We present our approach for solving the SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)
The goal is to detect whether a given word common to both the sentences evokes the same meaning.
We submit systems for both the settings - Multilingual and Cross-Lingual.
arXiv Detail & Related papers (2021-04-04T08:49:28Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question
Answering, and Abstractive Summarization [53.67205506042232]
CO-Search is a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature.
To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations.
We evaluate our system on the data of the TREC-COVID information retrieval challenge.
arXiv Detail & Related papers (2020-06-17T01:32:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.