Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition
- URL: http://arxiv.org/abs/2410.02281v2
- Date: Fri, 4 Oct 2024 09:16:02 GMT
- Title: Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition
- Authors: Arthur Amalvy, Vincent Labatut,
- Abstract summary: This document describes the guidelines applied during its annotation.
It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels.
- Score: 3.4955349700835034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Novelties corpus is a collection of novels (and parts of novels) annotated for Named Entity Recognition (NER) among other tasks. This document describes the guidelines applied during its annotation. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating expressions that should be marked as entities as well as expressions that should not.
Related papers
- Annotation Guidelines for Corpus Novelties: Part 2 -- Alias Resolution Version 1.0 [3.4955349700835034]
The Novelties corpus is a collection of novels (and parts of novels) annotated for Alias Resolution.
This document describes the guidelines applied during the annotation process.
arXiv Detail & Related papers (2024-10-01T09:06:52Z) - Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval [53.89454443114146]
We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets.
Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space.
We propose a Knowledge-Enhanced Dual-stream zero-shot composed image retrieval framework (KEDs)
KEDs implicitly models the attributes of the reference images by incorporating a database.
arXiv Detail & Related papers (2024-03-24T04:23:56Z) - Unsupervised Mapping of Arguments of Deverbal Nouns to Their
Corresponding Verbal Labels [52.940886615390106]
Deverbal nouns are verbs commonly used in written English texts to describe events or actions, as well as their arguments.
The solutions that do exist for handling arguments of nominalized constructions are based on semantic annotation.
We propose to adopt a more syntactic approach, which maps the arguments of deverbal nouns to the corresponding verbal construction.
arXiv Detail & Related papers (2023-06-24T10:07:01Z) - Quotations, Coreference Resolution, and Sentiment Annotations in
Croatian News Articles: An Exploratory Study [0.0]
The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian.
The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.
arXiv Detail & Related papers (2022-12-14T11:54:12Z) - DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image
Models [53.29993651680099]
We show that DALLE-2 does not follow the constraint that each word has a single role in the interpretation.
We show that DALLE-2 depicts both senses of nouns with multiple senses at once.
arXiv Detail & Related papers (2022-10-19T14:52:40Z) - Longtonotes: OntoNotes with Longer Coreference Chains [111.73115731999793]
We build a corpus of coreference-annotated documents of significantly longer length than what is currently available.
The resulting corpus, which we call LongtoNotes, contains documents in multiple genres of the English language with varying lengths.
We evaluate state-of-the-art neural coreference systems on this new corpus.
arXiv Detail & Related papers (2022-10-07T15:58:41Z) - The Fellowship of the Authors: Disambiguating Names from Social Network
Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities.
We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods.
We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z) - Annotation Guidelines for the Turku Paraphrase Corpus [0.6538951857199963]
This document describes the annotation guidelines used to construct the Turku Paraphrase Corpus.
Our paraphrase annotation scheme uses the base scale 1-4, where labels 1 and 2 are used for negative candidates (not paraphrases)
In addition to base labeling, the scheme is enriched with additional subcategories (flags) for categorizing different types of paraphrases inside the two positive labels.
arXiv Detail & Related papers (2021-08-17T08:32:55Z) - A Corpus Study and Annotation Schema for Named Entity Recognition and
Relation Extraction of Business Products [68.26059718611914]
We present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions.
We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity.
We present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.
arXiv Detail & Related papers (2020-04-07T11:45:22Z) - CASE: Context-Aware Semantic Expansion [68.30244980290742]
This paper defines and studies a new task called Context-Aware Semantic Expansion (CASE)
Given a seed term in a sentential context, we aim to suggest other terms that well fit the context as the seed.
We show that annotations for this task can be harvested at scale from existing corpora, in a fully automatic manner.
arXiv Detail & Related papers (2019-12-31T06:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.