Investigating Multilingual Coreference Resolution by Universal
Annotations
- URL: http://arxiv.org/abs/2310.17734v1
- Date: Thu, 26 Oct 2023 18:50:04 GMT
- Title: Investigating Multilingual Coreference Resolution by Universal
Annotations
- Authors: Haixia Chai and Michael Strube
- Abstract summary: We study coreference by examining the ground truth data at different linguistic levels.
We perform an error analysis of the most challenging cases that the SotA system fails to resolve.
We extract features from universal morphosyntactic annotations and integrate these features into a baseline system to assess their potential benefits.
- Score: 11.035051211351213
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multilingual coreference resolution (MCR) has been a long-standing and
challenging task. With the newly proposed multilingual coreference dataset,
CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by
using its harmonized universal morphosyntactic and coreference annotations.
First, we study coreference by examining the ground truth data at different
linguistic levels, namely mention, entity and document levels, and across
different genres, to gain insights into the characteristics of coreference
across multiple languages. Second, we perform an error analysis of the most
challenging cases that the SotA system fails to resolve in the CRAC 2022 shared
task using the universal annotations. Last, based on this analysis, we extract
features from universal morphosyntactic annotations and integrate these
features into a baseline system to assess their potential benefits for the MCR
task. Our results show that our best configuration of features improves the
baseline by 0.9% F1 score.
Related papers
- USTCCTSU at SemEval-2024 Task 1: Reducing Anisotropy for Cross-lingual Semantic Textual Relatedness Task [17.905282052666333]
Cross-lingual semantic textual relatedness task is an important research task that addresses challenges in cross-lingual communication and text understanding.
It helps establish semantic connections between different languages, crucial for downstream tasks like machine translation, multilingual information retrieval, and cross-lingual text understanding.
With our approach, we achieve a 2nd score in Spanish, a 3rd in Indonesian, and multiple entries in the top ten results in the competition's track C.
arXiv Detail & Related papers (2024-11-28T08:40:14Z) - Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD [0.8602553195689511]
We present a novel end-to-end neural coreference resolution system utilizing the CorefUD 1.1 dataset.
The proposed model is based on the standard end-to-end neural coreference resolution system.
We propose several extensions to enhance performance across diverse linguistic contexts.
arXiv Detail & Related papers (2024-08-29T20:27:05Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.