Current Topological and Machine Learning Applications for Bias Detection
in Text
- URL: http://arxiv.org/abs/2311.13495v1
- Date: Wed, 22 Nov 2023 16:12:42 GMT
- Title: Current Topological and Machine Learning Applications for Bias Detection
in Text
- Authors: Colleen Farrelly, Yashbir Singh, Quincy A. Hathaway, Gunnar Carlsson,
Ashok Choudhary, Rahul Paul, Gianfranco Doretto, Yassine Himeur, Shadi Atalls
and Wathiq Mansoor
- Abstract summary: This study utilizes the RedditBias database to analyze textual biases.
Four transformer models, including BERT and RoBERTa variants, were explored.
Findings suggest BERT, particularly mini BERT, excels in bias classification, while multilingual models lag.
- Score: 4.799066966918178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Institutional bias can impact patient outcomes, educational attainment, and
legal system navigation. Written records often reflect bias, and once bias is
identified; it is possible to refer individuals for training to reduce bias.
Many machine learning tools exist to explore text data and create predictive
models that can search written records to identify real-time bias. However, few
previous studies investigate large language model embeddings and geometric
models of biased text data to understand geometry's impact on bias modeling
accuracy. To overcome this issue, this study utilizes the RedditBias database
to analyze textual biases. Four transformer models, including BERT and RoBERTa
variants, were explored. Post-embedding, t-SNE allowed two-dimensional
visualization of data. KNN classifiers differentiated bias types, with lower
k-values proving more effective. Findings suggest BERT, particularly mini BERT,
excels in bias classification, while multilingual models lag. The
recommendation emphasizes refining monolingual models and exploring
domain-specific biases.
Related papers
- Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Unlocking Bias Detection: Leveraging Transformer-Based Models for Content Analysis [1.8692054990918079]
We present the Contextualized Bi-Directional Dual Transformer (CBDT) textcolorgreenfaLeaf classifier.
We have prepared a dataset specifically for training these models to identify and locate biases in texts.
Our evaluations across various datasets demonstrate CBDT textcolorgreen effectiveness in distinguishing biased narratives from neutral ones and identifying specific biased terms.
arXiv Detail & Related papers (2023-09-30T12:06:04Z) - A survey on bias in machine learning research [0.0]
Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias.
This article aims to bridge the gap between past literature on bias in research by providing taxonomy for potential sources of bias and errors in data and models.
arXiv Detail & Related papers (2023-08-22T07:56:57Z) - Soft-prompt Tuning for Large Language Models to Evaluate Bias [0.03141085922386211]
Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection.
We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns.
arXiv Detail & Related papers (2023-06-07T19:11:25Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Debiasing Stance Detection Models with Counterfactual Reasoning and
Adversarial Bias Learning [15.68462203989933]
Stance detection models tend to rely on dataset bias in the text part as a shortcut.
We propose an adversarial bias learning module to model the bias more accurately.
arXiv Detail & Related papers (2022-12-20T16:20:56Z) - Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels.
We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels.
Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.