Related papers: O-Dang! The Ontology of Dangerous Speech Messages

O-Dang! The Ontology of Dangerous Speech Messages

URL: http://arxiv.org/abs/2207.10652v1
Date: Wed, 13 Jul 2022 11:50:05 GMT
Title: O-Dang! The Ontology of Dangerous Speech Messages
Authors: Marco A. Stranisci, Simona Frenda, Mirko Lai, Oscar Araque, Alessandra T. Cignarella, Valerio Basile, Viviana Patti, Cristina Bosco
Abstract summary: We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. It provides a model for encoding both gold standard and single-annotator labels in the KG.
Score: 53.15616413153125
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of "gold standard", which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this paper we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account for a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG. The paper is structured as follows. In Section 1 the motivations of our work are outlined. Section 2 describes the O-Dang! Ontology, that provides a common semantic model for the integration of datasets in the KG. The Ontology Population stage with information about corpora, users, and annotations is presented in Section 3. Finally, in Section 4 an analysis of offensiveness across corpora is provided as a first case study for the resource.

Related papers

AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI [1.3060410279656598]
AgoraSpeech is a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023. The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection.
arXiv Detail & Related papers (2025-01-09T18:17:59Z)
Leveraging Ontologies to Document Bias in Data [1.0635248457021496]
Doc-BiasO is a resource that aims to create an integrated vocabulary of biases defined in the textitfair-ML literature and their measures. Our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI.
arXiv Detail & Related papers (2024-06-29T18:41:07Z)
EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs [41.928535719157054]
We propose an initial comprehensive framework called EventGround to tackle the problem of grounding free-texts to eventuality-centric knowledge graphs. We provide simple yet effective parsing and partial information extraction methods to tackle these problems. Our framework, incorporating grounded knowledge, achieves state-of-the-art performance while providing interpretable evidence.
arXiv Detail & Related papers (2024-03-30T01:16:37Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps. We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities. Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z)
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence. We introduce a new Compositional Temporal Grounding task and construct two new dataset splits. We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z)
Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks [0.6486052012623045]
We propose a novel topic clustering approach using bimodal vector representations of entities. Our approach is better suited to working with entities in comparison to state-of-the-art models.
arXiv Detail & Related papers (2023-01-06T10:54:54Z)
Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information. KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based. Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z)
Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training [22.534866015730664]
We verbalize the entire English Wikidata KG. We show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.
arXiv Detail & Related papers (2020-10-23T22:14:50Z)
Computational linguistic assessment of textbook and online learning media by means of threshold concepts in business education [59.003956312175795]
From a linguistic perspective, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features. The profiles of 63 threshold concepts from business education have been investigated in textbooks, newspapers, and Wikipedia. The three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles.
arXiv Detail & Related papers (2020-08-05T12:56:16Z)
Cross-lingual Entity Alignment with Incidental Supervision [76.66793175159192]
We propose an incidentally supervised model, JEANS, which jointly represents multilingual KGs and text corpora in a shared embedding scheme. Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision.
arXiv Detail & Related papers (2020-05-01T01:53:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.