Investigating Multilingual Coreference Resolution by Universal
Annotations
- URL: http://arxiv.org/abs/2310.17734v1
- Date: Thu, 26 Oct 2023 18:50:04 GMT
- Title: Investigating Multilingual Coreference Resolution by Universal
Annotations
- Authors: Haixia Chai and Michael Strube
- Abstract summary: We study coreference by examining the ground truth data at different linguistic levels.
We perform an error analysis of the most challenging cases that the SotA system fails to resolve.
We extract features from universal morphosyntactic annotations and integrate these features into a baseline system to assess their potential benefits.
- Score: 11.035051211351213
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multilingual coreference resolution (MCR) has been a long-standing and
challenging task. With the newly proposed multilingual coreference dataset,
CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by
using its harmonized universal morphosyntactic and coreference annotations.
First, we study coreference by examining the ground truth data at different
linguistic levels, namely mention, entity and document levels, and across
different genres, to gain insights into the characteristics of coreference
across multiple languages. Second, we perform an error analysis of the most
challenging cases that the SotA system fails to resolve in the CRAC 2022 shared
task using the universal annotations. Last, based on this analysis, we extract
features from universal morphosyntactic annotations and integrate these
features into a baseline system to assess their potential benefits for the MCR
task. Our results show that our best configuration of features improves the
baseline by 0.9% F1 score.
Related papers
- Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD [0.0]
This paper presents our end-to-end neural coreference resolution system.
We first establish strong baseline models, including monolingual and cross-lingual variations.
We propose several extensions to enhance performance across diverse linguistic contexts.
arXiv Detail & Related papers (2024-08-29T20:27:05Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z) - Entity Linking in 100 Languages [3.2099113524828513]
We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base.
We train a dual encoder in this new setting, building on prior work with improved feature representation, negative mining, and an auxiliary entity-pairing task.
The model outperforms state-of-the-art results from a far more limited cross-lingual linking task.
arXiv Detail & Related papers (2020-11-05T07:28:35Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.