ezCoref: Towards Unifying Annotation Guidelines for Coreference
Resolution
- URL: http://arxiv.org/abs/2210.07188v1
- Date: Thu, 13 Oct 2022 17:09:59 GMT
- Title: ezCoref: Towards Unifying Annotation Guidelines for Coreference
Resolution
- Authors: Ankita Gupta, Marzena Karpinska, Wenlong Zhao, Kalpesh Krishna, Jack
Merullo, Luke Yeh, Mohit Iyyer, Brendan O'Connor
- Abstract summary: We develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial.
We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets.
Surprisingly, we find that reasonable quality annotations were already achievable (>90% agreement between the crowd and expert annotations) even without extensive training.
- Score: 28.878540389202367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale, high-quality corpora are critical for advancing research in
coreference resolution. However, existing datasets vary in their definition of
coreferences and have been collected via complex and lengthy guidelines that
are curated for linguistic experts. These concerns have sparked a growing
interest among researchers to curate a unified set of guidelines suitable for
annotators with various backgrounds. In this work, we develop a
crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting
of an annotation tool and an interactive tutorial. We use ezCoref to
re-annotate 240 passages from seven existing English coreference datasets
(spanning fiction, news, and multiple other domains) while teaching annotators
only cases that are treated similarly across these datasets. Surprisingly, we
find that reasonable quality annotations were already achievable (>90%
agreement between the crowd and expert annotations) even without extensive
training. On carefully analyzing the remaining disagreements, we identify the
presence of linguistic cases that our annotators unanimously agree upon but
lack unified treatments (e.g., generic pronouns, appositives) in existing
datasets. We propose the research community should revisit these phenomena when
curating future unified annotation guidelines.
Related papers
- Persian Homograph Disambiguation: Leveraging ParsBERT for Enhanced Sentence Understanding with a Novel Word Disambiguation Dataset [0.0]
We introduce a novel dataset tailored for Persian homograph disambiguation.
Our work encompasses a thorough exploration of various embeddings, evaluated through the cosine similarity method.
We scrutinize the models' performance in terms of Accuracy, Recall, and F1 Score.
arXiv Detail & Related papers (2024-05-24T14:56:36Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Different Tastes of Entities: Investigating Human Label Variation in
Named Entity Annotations [23.059491714512077]
This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian.
We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions.
arXiv Detail & Related papers (2024-02-02T14:08:34Z) - When a Language Question Is at Stake. A Revisited Approach to Label
Sensitive Content [0.0]
Article revisits an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war.
We provide a fundamental statistical analysis of the obtained data, evaluation of models used for pseudo-labelling, and set further guidelines on how the scientists can leverage the corpus.
arXiv Detail & Related papers (2023-11-17T13:35:10Z) - Extending an Event-type Ontology: Adding Verbs and Classes Using
Fine-tuned LLMs Suggestions [0.0]
We have investigated the use of advanced machine learning methods for pre-annotating data for a lexical extension task.
We have examined the correlation of the automatic scores with the human annotation.
While the correlation turned out to be strong, its influence on the annotation proper is modest due to its near linearity.
arXiv Detail & Related papers (2023-06-03T14:57:47Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Monolingual alignment of word senses and definitions in lexicographical
resources [0.0]
The focus of this thesis is broadly on the alignment of lexicographical data, particularly dictionaries.
The first task aims to find an optimal alignment given the sense definitions of a headword in two different monolingual dictionaries.
This benchmark can be used for evaluation purposes of word-sense alignment systems.
arXiv Detail & Related papers (2022-09-06T13:09:52Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Active Learning for Coreference Resolution using Discrete Annotation [76.36423696634584]
We improve upon pairwise annotation for active learning in coreference resolution.
We ask annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.
In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour.
arXiv Detail & Related papers (2020-04-28T17:17:11Z) - CASE: Context-Aware Semantic Expansion [68.30244980290742]
This paper defines and studies a new task called Context-Aware Semantic Expansion (CASE)
Given a seed term in a sentential context, we aim to suggest other terms that well fit the context as the seed.
We show that annotations for this task can be harvested at scale from existing corpora, in a fully automatic manner.
arXiv Detail & Related papers (2019-12-31T06:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.