Annotation Guidelines for Corpus Novelties: Part 2 -- Alias Resolution Version 1.0
- URL: http://arxiv.org/abs/2410.00522v1
- Date: Tue, 1 Oct 2024 09:06:52 GMT
- Title: Annotation Guidelines for Corpus Novelties: Part 2 -- Alias Resolution Version 1.0
- Authors: Arthur Amalvy, Vincent Labatut,
- Abstract summary: The Novelties corpus is a collection of novels (and parts of novels) annotated for Alias Resolution.
This document describes the guidelines applied during the annotation process.
- Score: 3.4955349700835034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Novelties corpus is a collection of novels (and parts of novels) annotated for Alias Resolution, among other tasks. This document describes the guidelines applied during the annotation process. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating how canonical names should be defined, and which names should be considered as referring to the same entity.
Related papers
- Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition [3.4955349700835034]
This document describes the guidelines applied during its annotation.
It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels.
arXiv Detail & Related papers (2024-10-03T08:03:40Z) - Unsupervised Mapping of Arguments of Deverbal Nouns to Their
Corresponding Verbal Labels [52.940886615390106]
Deverbal nouns are verbs commonly used in written English texts to describe events or actions, as well as their arguments.
The solutions that do exist for handling arguments of nominalized constructions are based on semantic annotation.
We propose to adopt a more syntactic approach, which maps the arguments of deverbal nouns to the corresponding verbal construction.
arXiv Detail & Related papers (2023-06-24T10:07:01Z) - Aggregating Crowdsourced and Automatic Judgments to Scale Up a Corpus of
Anaphoric Reference for Fiction and Wikipedia Texts [16.42217979543271]
This paper introduces a new release of a corpus for anaphoric reference labelled via a game-with-a-purpose.
It is comparable in size to the largest existing corpora for anaphoric reference due in part to substantial activity by the players.
The proposed method could be adopted to greatly speed up annotation time in other projects involving games-with-a-purpose.
arXiv Detail & Related papers (2022-10-11T16:13:57Z) - Longtonotes: OntoNotes with Longer Coreference Chains [111.73115731999793]
We build a corpus of coreference-annotated documents of significantly longer length than what is currently available.
The resulting corpus, which we call LongtoNotes, contains documents in multiple genres of the English language with varying lengths.
We evaluate state-of-the-art neural coreference systems on this new corpus.
arXiv Detail & Related papers (2022-10-07T15:58:41Z) - The Fellowship of the Authors: Disambiguating Names from Social Network
Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z) - Annotation Guidelines for the Turku Paraphrase Corpus [0.6538951857199963]
This document describes the annotation guidelines used to construct the Turku Paraphrase Corpus.
Our paraphrase annotation scheme uses the base scale 1-4, where labels 1 and 2 are used for negative candidates (not paraphrases)
In addition to base labeling, the scheme is enriched with additional subcategories (flags) for categorizing different types of paraphrases inside the two positive labels.
arXiv Detail & Related papers (2021-08-17T08:32:55Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - Named Tensor Notation [117.30373263410507]
We propose a notation for tensors with named axes.
It relieves the author, reader, and future implementers from the burden of keeping track of the order of axes.
It also makes it easy to extend operations on low-order tensors to higher order ones.
arXiv Detail & Related papers (2021-02-25T22:21:30Z) - A Corpus Study and Annotation Schema for Named Entity Recognition and
Relation Extraction of Business Products [68.26059718611914]
We present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions.
We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity.
We present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.
arXiv Detail & Related papers (2020-04-07T11:45:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.