Related papers: Multilingual Event Linking to Wikidata

Multilingual Event Linking to Wikidata

URL: http://arxiv.org/abs/2204.06535v1
Date: Wed, 13 Apr 2022 17:28:23 GMT
Title: Multilingual Event Linking to Wikidata
Authors: Adithya Pratapa, Rishubh Gupta, Teruko Mitamura
Abstract summary: We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English. We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata.
Score: 5.726712522440283
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a task of multilingual linking of events to a knowledge base. We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata. We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English. On the two proposed tasks, we compare multiple event linking systems including BM25+ (Lv and Zhai, 2011) and multilingual adaptations of the biencoder and crossencoder architectures from BLINK (Wu et al., 2020). In our experiments on the two task variants, we find both biencoder and crossencoder models significantly outperform the BM25+ baseline. Our results also indicate that the crosslingual task is in general more challenging than the multilingual task. To test the out-of-domain generalization of the proposed linking systems, we additionally create a Wikinews-based evaluation set. We present qualitative analysis highlighting various aspects captured by the proposed dataset, including the need for temporal reasoning over context and tackling diverse event descriptions across languages.

Related papers

SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset [34.40254709148148]
Code-Switching (CS) is the alternating use of two or more languages within a conversation or utterance.<n>This linguistic phenomenon poses challenges for Automatic Speech Recognition (ASR) systems.<n>textbfSwitchLingua is the first large-scale multilingual and multi-ethnic code-switching dataset.
arXiv Detail & Related papers (2025-05-30T05:54:46Z)
Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis [18.25948580496853]
Cross-lingual transfer-learning is widely used in Event Extraction for low-resource languages. This paper studies whether the typological similarity between source and target languages impacts the performance of cross-lingual transfer.
arXiv Detail & Related papers (2024-04-09T15:35:41Z)
CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval [5.97515243922116]
We present the Charles University system for the MRL2023 Shared Task on Multi-lingual Multi-task Information Retrieval. The goal of the shared task was to develop systems for named entity recognition and question answering in several under-represented languages. Our solutions to both subtasks rely on the translate-test approach.
arXiv Detail & Related papers (2023-10-25T10:22:49Z)
Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog [67.20796950016735]
Multi2WOZ dataset spans four typologically diverse languages: Chinese, German, Arabic, and Russian. We introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks. Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task.
arXiv Detail & Related papers (2022-05-20T18:35:38Z)
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages. Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z)
Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language. The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German. We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z)
Handshakes AI Research at CASE 2021 Task 1: Exploring different approaches for multilingual tasks [0.22940141855172036]
The aim of the CASE 2021 Shared Task 1 was to detect and classify socio-political and crisis event information in a multilingual setting. Our submission contained entries in all of the subtasks, and the scores obtained validated our research finding.
arXiv Detail & Related papers (2021-10-29T07:58:49Z)
Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking [84.50302759362698]
We enhance the transfer learning process by intermediate fine-tuning of pretrained multilingual models. We use parallel and conversational movie subtitles datasets to design cross-lingual intermediate tasks. We achieve impressive improvements (> 20% on goal accuracy) on the parallel MultiWoZ dataset and Multilingual WoZ dataset.
arXiv Detail & Related papers (2021-09-28T11:22:38Z)
MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation using Augmented Data, Signals, and Transformers [1.869621561196521]
We present our approach for solving the SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) The goal is to detect whether a given word common to both the sentences evokes the same meaning. We submit systems for both the settings - Multilingual and Cross-Lingual.
arXiv Detail & Related papers (2021-04-04T08:49:28Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation [100.09099800591822]
XGLUE is a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models. XGLUE provides 11 diversified tasks that cover both natural language understanding and generation scenarios.
arXiv Detail & Related papers (2020-04-03T07:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.