Leveraging Ontologies to Document Bias in Data
- URL: http://arxiv.org/abs/2407.00509v2
- Date: Fri, 9 Aug 2024 18:18:55 GMT
- Title: Leveraging Ontologies to Document Bias in Data
- Authors: Mayra Russo, Maria-Esther Vidal,
- Abstract summary: Doc-BiasO is a resource that aims to create an integrated vocabulary of biases defined in the textitfair-ML literature and their measures.
Our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI.
- Score: 1.0635248457021496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning (ML) systems are capable of reproducing and often amplifying undesired biases. This puts emphasis on the importance of operating under practices that enable the study and understanding of the intrinsic characteristics of ML pipelines, prompting the emergence of documentation frameworks with the idea that ``any remedy for bias starts with awareness of its existence''. However, a resource that can formally describe these pipelines in terms of biases detected is still amiss. To fill this gap, we present the Doc-BiasO ontology, a resource that aims to create an integrated vocabulary of biases defined in the \textit{fair-ML} literature and their measures, as well as to incorporate relevant terminology and the relationships between them. Overseeing ontology engineering best practices, we re-use existing vocabulary on machine learning and AI, to foster knowledge sharing and interoperability between the actors concerned with its research, development, regulation, among others. Overall, our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI and to improve the interpretation of bias in data and downstream impact.
Related papers
- Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Situated Ground Truths: Enhancing Bias-Aware AI by Situating Data Labels with SituAnnotate [0.1843404256219181]
SituAnnotate is a novel ontology-based approach to structured and context-aware data annotation.
It aims to anchor the ground truth data employed in training AI systems within the contextual and culturally-bound situations.
As a method to create, query, and compare label-based datasets, SituAnnotate empowers downstream AI systems to undergo training with explicit consideration of context and cultural bias.
arXiv Detail & Related papers (2024-06-10T09:33:13Z) - Data Set Terminology of Deep Learning in Medicine: A Historical Review and Recommendation [0.7897552065199818]
Medicine and deep learning-based artificial intelligence engineering represent two distinct fields each with decades of published history.
With such history comes a set of terminology that has a specific way in which it is applied.
This review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical AI contexts.
arXiv Detail & Related papers (2024-04-30T07:07:45Z) - Informed Meta-Learning [55.2480439325792]
Meta-learning and informed ML stand out as two approaches for incorporating prior knowledge into ML pipelines.
We formalise a hybrid paradigm, informed meta-learning, facilitating the incorporation of priors from unstructured knowledge representations.
We demonstrate the potential benefits of informed meta-learning in improving data efficiency, robustness to observational noise and task distribution shifts.
arXiv Detail & Related papers (2024-02-25T15:08:37Z) - Prompt-based Logical Semantics Enhancement for Implicit Discourse
Relation Recognition [4.7938839332508945]
We propose a Prompt-based Logical Semantics Enhancement (PLSE) method for Implicit Discourse Relation Recognition (IDRR)
Our method seamlessly injects knowledge relevant to discourse relation into pre-trained language models through prompt-based connective prediction.
Experimental results on PDTB 2.0 and CoNLL16 datasets demonstrate that our method achieves outstanding and consistent performance against the current state-of-the-art models.
arXiv Detail & Related papers (2023-11-01T08:38:08Z) - Improving Language Models Meaning Understanding and Consistency by
Learning Conceptual Roles from Dictionary [65.268245109828]
Non-human-like behaviour of contemporary pre-trained language models (PLMs) is a leading cause undermining their trustworthiness.
A striking phenomenon is the generation of inconsistent predictions, which produces contradictory results.
We propose a practical approach that alleviates the inconsistent behaviour issue by improving PLM awareness.
arXiv Detail & Related papers (2023-10-24T06:15:15Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Semantic Interactive Learning for Text Classification: A Constructive
Approach for Contextual Interactions [0.0]
We propose a novel interaction framework called Semantic Interactive Learning for the text domain.
We frame the problem of incorporating constructive and contextual feedback into the learner as a task to find an architecture that enables more semantic alignment between humans and machines.
We introduce a technique called SemanticPush that is effective for translating conceptual corrections of humans to non-extrapolating training examples.
arXiv Detail & Related papers (2022-09-07T08:13:45Z) - O-Dang! The Ontology of Dangerous Speech Messages [53.15616413153125]
We present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG)
O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community.
It provides a model for encoding both gold standard and single-annotator labels in the KG.
arXiv Detail & Related papers (2022-07-13T11:50:05Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.