O-Dang! The Ontology of Dangerous Speech Messages
- URL: http://arxiv.org/abs/2207.10652v1
- Date: Wed, 13 Jul 2022 11:50:05 GMT
- Title: O-Dang! The Ontology of Dangerous Speech Messages
- Authors: Marco A. Stranisci, Simona Frenda, Mirko Lai, Oscar Araque, Alessandra
T. Cignarella, Valerio Basile, Viviana Patti, Cristina Bosco
- Abstract summary: We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
- Score: 53.15616413153125
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Inside the NLP community there is a considerable amount of language resources
created, annotated and released every day with the aim of studying specific
linguistic phenomena. Despite a variety of attempts in order to organize such
resources has been carried on, a lack of systematic methods and of possible
interoperability between resources are still present. Furthermore, when storing
linguistic information, still nowadays, the most common practice is the concept
of "gold standard", which is in contrast with recent trends in NLP that aim at
stressing the importance of different subjectivities and points of view when
training machine learning and deep learning methods. In this paper we present
O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and
interoperable Knowledge Graph (KG) for the collection of linguistic annotated
data. O-Dang! is designed to gather and organize Italian datasets into a
structured KG, according to the principles shared within the Linguistic Linked
Open Data community. The ontology has also been designed to account for a
perspectivist approach, since it provides a model for encoding both gold
standard and single-annotator labels in the KG. The paper is structured as
follows. In Section 1 the motivations of our work are outlined. Section 2
describes the O-Dang! Ontology, that provides a common semantic model for the
integration of datasets in the KG. The Ontology Population stage with
information about corpora, users, and annotations is presented in Section 3.
Finally, in Section 4 an analysis of offensiveness across corpora is provided
as a first case study for the resource.
Related papers
- Multilingual corpora for the study of new concepts in the social sciences and humanities: [0.0]
This article presents a hybrid methodology for building a multilingual corpus designed to support the study of emerging concepts in the humanities and social sciences.<n>The corpus relies on two complementary sources: (1) textual content automatically extracted from company websites, cleaned for French and English, and (2) annual reports collected and automatically filtered according to documentary criteria (year, format, duplication)<n>The processing pipeline includes automatic language detection, filtering of non-relevant content, extraction of relevant segments, and enrichment with structural metadata.
arXiv Detail & Related papers (2025-12-08T10:04:50Z) - Towards Corpus-Grounded Agentic LLMs for Multilingual Grammatical Analysis [0.5545791216381869]
We explore how agentic large language models (LLMs) can streamline the systematic analysis of annotated corpora.<n>We introduce an agentic framework for corpus-grounded grammatical analysis that integrates concepts such as natural-language task interpretation.<n>We test the system on multilingual grammatical tasks inspired by the World Atlas of Language Structures (WALS)
arXiv Detail & Related papers (2025-11-28T21:27:58Z) - AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives [0.5514573274011145]
AWARE is a framework that attempts to improve a transformer model's awareness for this nuanced task.<n>We show that by making the model explicitly aware of the properties of the input, AWARE outperforms a strong baseline by 2.1 percentage points in Macro-F1.<n>This work provides a robust and generalizable methodology for any text classification task in which meaning depends on the context of the narrative.
arXiv Detail & Related papers (2025-10-06T16:19:57Z) - AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI [1.3060410279656598]
AgoraSpeech is a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023.
The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection.
arXiv Detail & Related papers (2025-01-09T18:17:59Z) - Leveraging Ontologies to Document Bias in Data [1.0635248457021496]
Doc-BiasO is a resource that aims to create an integrated vocabulary of biases defined in the textitfair-ML literature and their measures.
Our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI.
arXiv Detail & Related papers (2024-06-29T18:41:07Z) - EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs [41.928535719157054]
We propose an initial comprehensive framework called EventGround to tackle the problem of grounding free-texts to eventuality-centric knowledge graphs.
We provide simple yet effective parsing and partial information extraction methods to tackle these problems.
Our framework, incorporating grounded knowledge, achieves state-of-the-art performance while providing interpretable evidence.
arXiv Detail & Related papers (2024-03-30T01:16:37Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks [0.6486052012623045]
We propose a novel topic clustering approach using bimodal vector representations of entities.
Our approach is better suited to working with entities in comparison to state-of-the-art models.
arXiv Detail & Related papers (2023-01-06T10:54:54Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced
Language Model Pre-training [22.534866015730664]
We verbalize the entire English Wikidata KG.
We show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.
arXiv Detail & Related papers (2020-10-23T22:14:50Z) - Computational linguistic assessment of textbook and online learning
media by means of threshold concepts in business education [59.003956312175795]
From a linguistic perspective, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features.
The profiles of 63 threshold concepts from business education have been investigated in textbooks, newspapers, and Wikipedia.
The three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles.
arXiv Detail & Related papers (2020-08-05T12:56:16Z) - Cross-lingual Entity Alignment with Incidental Supervision [76.66793175159192]
We propose an incidentally supervised model, JEANS, which jointly represents multilingual KGs and text corpora in a shared embedding scheme.
Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision.
arXiv Detail & Related papers (2020-05-01T01:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.