Findings of the Shared Task on Multilingual Coreference Resolution
- URL: http://arxiv.org/abs/2209.07841v1
- Date: Fri, 16 Sep 2022 10:17:06 GMT
- Title: Findings of the Shared Task on Multilingual Coreference Resolution
- Authors: Zden\v{e}k \v{Z}abokrtsk\'y, Miloslav Konop\'ik, Anna Nedoluzhko,
Michal Nov\'ak, Maciej Ogrodniczuk, Martin Popel, Ond\v{r}ej Pra\v{z}\'ak,
Jakub Sido, Daniel Zeman and Yilun Zhu
- Abstract summary: This paper presents an overview of the shared task on multilingual coreference resolution associated with the CRAC 2022 workshop.
Shared task participants were supposed to develop trainable systems capable of identifying mentions and clustering them according to identity coreference.
The winner system outperformed the baseline by 12 percentage points.
- Score: 2.000107169037543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents an overview of the shared task on multilingual
coreference resolution associated with the CRAC 2022 workshop. Shared task
participants were supposed to develop trainable systems capable of identifying
mentions and clustering them according to identity coreference. The public
edition of CorefUD 1.0, which contains 13 datasets for 10 languages, was used
as the source of training and evaluation data. The CoNLL score used in previous
coreference-oriented shared tasks was used as the main evaluation metric. There
were 8 coreference prediction systems submitted by 5 participating teams; in
addition, there was a competitive Transformer-based baseline system provided by
the organizers at the beginning of the shared task. The winner system
outperformed the baseline by 12 percentage points (in terms of the CoNLL scores
averaged across all datasets for individual languages).
Related papers
- Findings of the Third Shared Task on Multilingual Coreference Resolution [1.4191726968556324]
The paper presents an overview of the third edition of the shared task on multilingual coreference resolution, held as part of the CRAC 2024 workshop.
Similar to the previous two editions, the participants were challenged to develop systems capable of identifying mentions and clustering them based on identity coreference.
This year's edition took another step towards real-world application by not providing participants with gold slots for zero anaphora, increasing the task's complexity and realism.
arXiv Detail & Related papers (2024-10-21T12:30:44Z) - Overview of the PromptCBLUE Shared Task in CHIP2023 [26.56584015791646]
This paper presents an overview of the PromptC BLUE shared task held in the CHIP-2023 Conference.
It provides a good testbed for Chinese open-domain or medical-domain large language models (LLMs) in general medical natural language processing.
This paper describes the tasks, the datasets, evaluation metrics, and the top systems for both tasks.
arXiv Detail & Related papers (2023-12-29T09:05:00Z) - On Using Distribution-Based Compositionality Assessment to Evaluate
Compositional Generalisation in Machine Translation [10.840893953881652]
It is important to develop benchmarks to assess compositional generalisation in real-world natural language tasks.
This is done by splitting the Europarl translation corpus into a training and a test set in such a way that the test set requires compositional generalisation capacity.
This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages.
arXiv Detail & Related papers (2023-11-14T15:37:19Z) - Personalized Federated Learning with Feature Alignment and Classifier
Collaboration [13.320381377599245]
Data heterogeneity is one of the most challenging issues in federated learning.
One such approach in deep neural networks based tasks is employing a shared feature representation and learning a customized classifier head for each client.
In this work, we conduct explicit local-global feature alignment by leveraging global semantic knowledge for learning a better representation.
arXiv Detail & Related papers (2023-06-20T19:58:58Z) - SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
Evaluation [52.186343500576214]
We introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation.
SEAHORSE consists of 96K summaries with human ratings along 6 dimensions of text quality.
We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE and mFACE.
arXiv Detail & Related papers (2023-05-22T16:25:07Z) - Alibaba-Translate China's Submission for WMT 2022 Quality Estimation
Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE.
Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model.
Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse
Segmentation, Classification, and Connective Detection [4.371388370559826]
Our system, called DisCoDisCo, enhances contextualized word embeddings with hand-crafted features.
Results on relation classification suggest strong performance on the new 2021 benchmark.
A partial evaluation of multiple pre-trained Transformer-based language models indicates that models pre-trained on the Next Sentence Prediction task are optimal for relation classification.
arXiv Detail & Related papers (2021-09-20T18:11:05Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z) - XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training,
Understanding and Generation [100.09099800591822]
XGLUE is a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models.
XGLUE provides 11 diversified tasks that cover both natural language understanding and generation scenarios.
arXiv Detail & Related papers (2020-04-03T07:03:12Z) - Overview of the TREC 2019 Fair Ranking Track [65.15263872493799]
The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers.
This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process.
arXiv Detail & Related papers (2020-03-25T21:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.