It's All Relative: Interpretable Models for Scoring Bias in Documents
- URL: http://arxiv.org/abs/2307.08139v1
- Date: Sun, 16 Jul 2023 19:35:38 GMT
- Title: It's All Relative: Interpretable Models for Scoring Bias in Documents
- Authors: Aswin Suresh, Chi-Hsuan Wu, Matthias Grossglauser
- Abstract summary: We propose an interpretable model to score the bias present in web documents, based only on their textual content.
Our model incorporates assumptions reminiscent of the Bradley-Terry axioms and is trained on pairs of revisions of the same Wikipedia article.
We show that we can interpret the parameters of the trained model to discover the words most indicative of bias.
- Score: 10.678219157857946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an interpretable model to score the bias present in web documents,
based only on their textual content. Our model incorporates assumptions
reminiscent of the Bradley-Terry axioms and is trained on pairs of revisions of
the same Wikipedia article, where one version is more biased than the other.
While prior approaches based on absolute bias classification have struggled to
obtain a high accuracy for the task, we are able to develop a useful model for
scoring bias by learning to perform pairwise comparisons of bias accurately. We
show that we can interpret the parameters of the trained model to discover the
words most indicative of bias. We also apply our model in three different
settings - studying the temporal evolution of bias in Wikipedia articles,
comparing news sources based on bias, and scoring bias in law amendments. In
each case, we demonstrate that the outputs of the model can be explained and
validated, even for the two domains that are outside the training-data domain.
We also use the model to compare the general level of bias between domains,
where we see that legal texts are the least biased and news media are the most
biased, with Wikipedia articles in between. Given its high performance,
simplicity, interpretability, and wide applicability, we hope the model will be
useful for a large community, including Wikipedia and news editors, political
and social scientists, and the general public.
Related papers
- From Lists to Emojis: How Format Bias Affects Model Alignment [67.08430328350327]
We study format biases in reinforcement learning from human feedback.
Many widely-used preference models, including human evaluators, exhibit strong biases towards specific format patterns.
We show that with a small amount of biased data, we can inject significant bias into the reward model.
arXiv Detail & Related papers (2024-09-18T05:13:18Z) - DocNet: Semantic Structure in Inductive Bias Detection Models [0.4779196219827508]
In this paper, we explore an often overlooked aspect of bias detection in documents: the semantic structure of news articles.
We present DocNet, a novel, inductive, and low-resource document embedding and bias detection model.
We also demonstrate that the semantic structure of news articles from opposing partisan sides, as represented in document-level graph embeddings, have significant similarities.
arXiv Detail & Related papers (2024-06-16T14:51:12Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Bias in News Summarization: Measures, Pitfalls and Corpora [4.917075909999548]
We introduce definitions for biased behaviours in summarization models, along with practical operationalizations.
We measure gender bias in English summaries generated by both purpose-built summarization models and general purpose chat models.
We find content selection in single document summarization to be largely unaffected by gender bias, while hallucinations exhibit evidence of bias.
arXiv Detail & Related papers (2023-09-14T22:20:27Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Gender Biases and Where to Find Them: Exploring Gender Bias in
Pre-Trained Transformer-based Language Models Using Movement Pruning [32.62430731115707]
We show a novel framework for inspecting bias in transformer-based language models via movement pruning.
We implement our framework by pruning the model while fine-tuning it on the debiasing objective.
We re-discover a bias-performance trade-off: the better the model performs, the more bias it contains.
arXiv Detail & Related papers (2022-07-06T06:20:35Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels.
We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels.
Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z) - The Authors Matter: Understanding and Mitigating Implicit Bias in Deep
Text Classification [36.361778457307636]
Deep text classification models can produce biased outcomes for texts written by authors of certain demographic groups.
In this paper, we first demonstrate that implicit bias exists in different text classification tasks for different demographic groups.
We then build a learning-based interpretation method to deepen our knowledge of implicit bias.
arXiv Detail & Related papers (2021-05-06T16:17:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.