Related papers: Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition

Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition

URL: http://arxiv.org/abs/2410.02281v2
Date: Fri, 4 Oct 2024 09:16:02 GMT
Title: Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition
Authors: Arthur Amalvy, Vincent Labatut,
Abstract summary: This document describes the guidelines applied during its annotation. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels.
Score: 3.4955349700835034
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Novelties corpus is a collection of novels (and parts of novels) annotated for Named Entity Recognition (NER) among other tasks. This document describes the guidelines applied during its annotation. It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels, and illustrating expressions that should be marked as entities as well as expressions that should not.

Related papers

Annotation Guidelines for Corpus Novelties: Part 2 -- Alias Resolution Version 1.0 [3.4955349700835034]
The Novelties corpus is a collection of novels (and parts of novels) annotated for Alias Resolution. This document describes the guidelines applied during the annotation process.
arXiv Detail & Related papers (2024-10-01T09:06:52Z)
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval [53.89454443114146]
We study the zero-shot Composed Image Retrieval (ZS-CIR) task, which is to retrieve the target image given a reference image and a description without training on the triplet datasets. Previous works generate pseudo-word tokens by projecting the reference image features to the text embedding space. We propose a Knowledge-Enhanced Dual-stream zero-shot composed image retrieval framework (KEDs) KEDs implicitly models the attributes of the reference images by incorporating a database.
arXiv Detail & Related papers (2024-03-24T04:23:56Z)
Unsupervised Mapping of Arguments of Deverbal Nouns to Their Corresponding Verbal Labels [52.940886615390106]
Deverbal nouns are verbs commonly used in written English texts to describe events or actions, as well as their arguments. The solutions that do exist for handling arguments of nominalized constructions are based on semantic annotation. We propose to adopt a more syntactic approach, which maps the arguments of deverbal nouns to the corresponding verbal construction.
arXiv Detail & Related papers (2023-06-24T10:07:01Z)
Quotations, Coreference Resolution, and Sentiment Annotations in Croatian News Articles: An Exploratory Study [0.0]
The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian. The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.
arXiv Detail & Related papers (2022-12-14T11:54:12Z)
DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models [53.29993651680099]
We show that DALLE-2 does not follow the constraint that each word has a single role in the interpretation. We show that DALLE-2 depicts both senses of nouns with multiple senses at once.
arXiv Detail & Related papers (2022-10-19T14:52:40Z)
Longtonotes: OntoNotes with Longer Coreference Chains [111.73115731999793]
We build a corpus of coreference-annotated documents of significantly longer length than what is currently available. The resulting corpus, which we call LongtoNotes, contains documents in multiple genres of the English language with varying lengths. We evaluate state-of-the-art neural coreference systems on this new corpus.
arXiv Detail & Related papers (2022-10-07T15:58:41Z)
The Fellowship of the Authors: Disambiguating Names from Social Network Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities. We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods. We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z)
Annotation Guidelines for the Turku Paraphrase Corpus [0.6538951857199963]
This document describes the annotation guidelines used to construct the Turku Paraphrase Corpus. Our paraphrase annotation scheme uses the base scale 1-4, where labels 1 and 2 are used for negative candidates (not paraphrases) In addition to base labeling, the scheme is enriched with additional subcategories (flags) for categorizing different types of paraphrases inside the two positive labels.
arXiv Detail & Related papers (2021-08-17T08:32:55Z)
A Corpus Study and Annotation Schema for Named Entity Recognition and Relation Extraction of Business Products [68.26059718611914]
We present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions. We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity. We present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.
arXiv Detail & Related papers (2020-04-07T11:45:22Z)
CASE: Context-Aware Semantic Expansion [68.30244980290742]
This paper defines and studies a new task called Context-Aware Semantic Expansion (CASE) Given a seed term in a sentential context, we aim to suggest other terms that well fit the context as the seed. We show that annotations for this task can be harvested at scale from existing corpora, in a fully automatic manner.
arXiv Detail & Related papers (2019-12-31T06:38:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.