Observations on Annotations
- URL: http://arxiv.org/abs/2004.10283v1
- Date: Tue, 21 Apr 2020 20:29:50 GMT
- Title: Observations on Annotations
- Authors: Georg Rehm
- Abstract summary: It approaches the topic from several angles including Hypertext, Computational Linguistics and Language Technology, Artificial Intelligence and Open Science.
In terms of complexity, they can range from trivial to highly sophisticated, in terms of maturity from experimental to standardised.
Primary research data such as, e.g., text documents can be annotated on different layers concurrently, which are independent but can be exploited using multi-layer querying.
- Score: 0.5175994976508882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The annotation of textual information is a fundamental activity in
Linguistics and Computational Linguistics. This article presents various
observations on annotations. It approaches the topic from several angles
including Hypertext, Computational Linguistics and Language Technology,
Artificial Intelligence and Open Science. Annotations can be examined along
different dimensions. In terms of complexity, they can range from trivial to
highly sophisticated, in terms of maturity from experimental to standardised.
Annotations can be annotated themselves using more abstract annotations.
Primary research data such as, e.g., text documents can be annotated on
different layers concurrently, which are independent but can be exploited using
multi-layer querying. Standards guarantee interoperability and reusability of
data sets. The chapter concludes with four final observations, formulated as
research questions or rather provocative remarks on the current state of
annotation research.
Related papers
- Variationist: Exploring Multifaceted Variation and Bias in Written Language Data [3.666781404469562]
Exploring and understanding language data is a fundamental stage in all areas dealing with human language.
Yet, there is currently a lack of a unified, customizable tool to seamlessly inspect and visualize language variation and bias.
In this paper, we introduce Variationist, a highly-modular, descriptive, and task-agnostic tool that fills this gap.
arXiv Detail & Related papers (2024-06-25T15:41:07Z) - SPACE-IDEAS: A Dataset for Salient Information Detection in Space Innovation [0.3017070810884304]
We introduce SPACE-IDEAS, a dataset for salient information detection from innovation ideas related to the Space domain.
The text in SPACE-IDEAS varies greatly and includes informal, technical, academic and business-oriented writing styles.
In addition to a manually annotated dataset we release an extended version that is annotated using a large generative language model.
arXiv Detail & Related papers (2024-03-25T17:04:02Z) - Putting Context in Context: the Impact of Discussion Structure on Text
Classification [13.15873889847739]
We propose a series of experiments on a large dataset for stance detection in English.
We evaluate the contribution of different types of contextual information.
We show that structural information can be highly beneficial to text classification but only under certain circumstances.
arXiv Detail & Related papers (2024-02-05T12:56:22Z) - Studying Socially Unacceptable Discourse Classification (SUD) through
different eyes: "Are we on the same page ?" [4.87717454493713]
We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources.
This global context allows us to test the generalization ability of SUD classifiers.
From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning.
arXiv Detail & Related papers (2023-08-08T10:42:33Z) - Towards Open Vocabulary Learning: A Survey [146.90188069113213]
Deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection.
Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training.
This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2023-06-28T02:33:06Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Provenance for Linguistic Corpora Through Nanopublications [0.22940141855172028]
Research in Computational Linguistics is dependent on text corpora for training and testing new tools and methodologies.
While there exists a plethora of annotated linguistic information, these corpora are often not interoperable without significant manual work.
This paper addresses this issue with a case study on event annotated corpora and by creating a new, more interoperable representation of this data in the form of nanopublications.
arXiv Detail & Related papers (2020-06-11T11:30:30Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.