A Corpus Study and Annotation Schema for Named Entity Recognition and
Relation Extraction of Business Products
- URL: http://arxiv.org/abs/2004.03287v1
- Date: Tue, 7 Apr 2020 11:45:22 GMT
- Title: A Corpus Study and Annotation Schema for Named Entity Recognition and
Relation Extraction of Business Products
- Authors: Saskia Sch\"on, Veselina Mironova, Aleksandra Gabryszak, Leonhard
Hennig
- Abstract summary: We present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions.
We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity.
We present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.
- Score: 68.26059718611914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing non-standard entity types and relations, such as B2B products,
product classes and their producers, in news and forum texts is important in
application areas such as supply chain monitoring and market research. However,
there is a decided lack of annotated corpora and annotation guidelines in this
domain. In this work, we present a corpus study, an annotation schema and
associated guidelines, for the annotation of product entity and company-product
relation mentions. We find that although product mentions are often realized as
noun phrases, defining their exact extent is difficult due to high boundary
ambiguity and the broad syntactic and semantic variety of their surface
realizations. We also describe our ongoing annotation effort, and present a
preliminary corpus of English web and social media documents annotated
according to the proposed guidelines.
Related papers
- FASSILA: A Corpus for Algerian Dialect Fake News Detection and Sentiment Analysis [0.0]
The Algerian dialect (AD) faces challenges due to the absence of annotated corpora.
This study outlines the development process of a specialized corpus for Fake News (FN) detection and sentiment analysis (SA) in AD called FASSILA.
arXiv Detail & Related papers (2024-11-07T10:39:10Z) - Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition [3.4955349700835034]
This document describes the guidelines applied during its annotation.
It contains the instructions used by the annotators, as well as a number of examples retrieved from the annotated novels.
arXiv Detail & Related papers (2024-10-03T08:03:40Z) - Different Tastes of Entities: Investigating Human Label Variation in
Named Entity Annotations [23.059491714512077]
This paper studies disagreements in expert-annotated named entity datasets for three languages: English, Danish, and Bavarian.
We show that text ambiguity and artificial guideline changes are dominant factors for diverse annotations among high-quality revisions.
arXiv Detail & Related papers (2024-02-02T14:08:34Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Semantic Role Labeling Meets Definition Modeling: Using Natural Language
to Describe Predicate-Argument Structures [104.32063681736349]
We present an approach to describe predicate-argument structures using natural language definitions instead of discrete labels.
Our experiments and analyses on PropBank-style and FrameNet-style, dependency-based and span-based SRL also demonstrate that a flexible model with an interpretable output does not necessarily come at the expense of performance.
arXiv Detail & Related papers (2022-12-02T11:19:16Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - RuREBus: a Case Study of Joint Named Entity Recognition and Relation
Extraction from e-Government Domain [7.6462329126769815]
We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency.
The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English.
arXiv Detail & Related papers (2020-10-29T20:56:15Z) - A German Corpus for Fine-Grained Named Entity Recognition and Relation
Extraction of Traffic and Industry Events [63.08899104652265]
This work describes a corpus of German-language documents which has been annotated with fine-grained geo-entities.
It has also been annotated with a set of 15 traffic- and industry-related n-ary relations and events.
The corpus consists of newswire texts, Twitter messages, and traffic reports from radio stations, police and railway companies.
arXiv Detail & Related papers (2020-04-07T11:39:50Z) - Mining customer product reviews for product development: A summarization
process [0.7742297876120561]
This research set out to identify and structure from online reviews the words and expressions related to customers' likes and dislikes to guide product development.
The authors propose a summarization model containing multiples aspects of user preference, such as product affordances, emotions, usage conditions.
A case study demonstrates that with the proposed model and the annotation guidelines, human annotators can structure the online reviews with high inter-agreement.
arXiv Detail & Related papers (2020-01-13T13:01:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.