Related papers: Impacts of Racial Bias in Historical Training Data for News AI

Impacts of Racial Bias in Historical Training Data for News AI

URL: http://arxiv.org/abs/2512.16901v1
Date: Thu, 18 Dec 2025 18:56:11 GMT
Title: Impacts of Racial Bias in Historical Training Data for News AI
Authors: Rahul Bhargava, Malene Hornstrup Jespersen, Emily Boardman Ndulue, Vivica Dsouza,
Abstract summary: This paper investigates one such example trained on the broadly-used New York Times Annotated Corpus to create a multi-label classifier.<n>We find that the "blacks" label operates partially as a general "racism detector" across some minoritized groups.<n>However, it performs poorly against expectations on modern examples such as COVID-19 era anti-Asian hate stories, and reporting on the Black Lives Matter movement.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: AI technologies have rapidly moved into business and research applications that involve large text corpora, including computational journalism research and newsroom settings. These models, trained on extant data from various sources, can be conceptualized as historical artifacts that encode decades-old attitudes and stereotypes. This paper investigates one such example trained on the broadly-used New York Times Annotated Corpus to create a multi-label classifier. Our use in research settings surfaced the concerning "blacks" thematic topic label. Through quantitative and qualitative means we investigate this label's use in the training corpus, what concepts it might be encoding in the trained classifier, and how those concepts impact our model use. Via the application of explainable AI methods, we find that the "blacks" label operates partially as a general "racism detector" across some minoritized groups. However, it performs poorly against expectations on modern examples such as COVID-19 era anti-Asian hate stories, and reporting on the Black Lives Matter movement. This case study of interrogating embedded biases in a model reveals how similar applications in newsroom settings can lead to unexpected outputs that could impact a wide variety of potential uses of any large language model-story discovery, audience targeting, summarization, etc. The fundamental tension this exposes for newsrooms is how to adopt AI-enabled workflow tools while reducing the risk of reproducing historical biases in news coverage.

Related papers

Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts [29.95198868148809]
We propose a novel methodology that emulates the criteria that professional fact-checkers use to assess the factuality and political bias of an entire outlet.<n>We provide an in-depth error analysis of the effect of media popularity and region on model performance.
arXiv Detail & Related papers (2025-06-14T15:49:20Z)
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models [52.00270888041742]
We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries.<n>Our findings show significant geopolitical biases, with models favoring specific national narratives.<n>Simple debiasing prompts had a limited effect on reducing these biases.
arXiv Detail & Related papers (2025-06-07T10:45:17Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
Towards Auditing Large Language Models: Improving Text-based Stereotype Detection [5.3634450268516565]
This work introduces i) the Multi-Grain Stereotype dataset, which includes 52,751 instances of gender, race, profession and religion stereotypic text. We design several experiments to rigorously test the proposed model trained on the novel dataset. Experiments show that training the model in a multi-class setting can outperform the one-vs-all binary counterpart.
arXiv Detail & Related papers (2023-11-23T17:47:14Z)
Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese Media Bias Detection [16.343223974292908]
We build a dataset with Chinese news reports about COVID-19 which is annotated by our newly designed system. In Data-Augmented Context Enrichment (DACE), we enlarge the training data; while in Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval methods to select valuable information. Our results show that both methods outperform our baselines, while the RACE methods are more efficient and have more potential.
arXiv Detail & Related papers (2023-11-02T16:29:49Z)
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
NeuS: Neutral Multi-News Summarization for Mitigating Framing Bias [54.89737992911079]
We propose a new task, a neutral summary generation from multiple news headlines of the varying political spectrum. One of the most interesting observations is that generation models can hallucinate not only factually inaccurate or unverifiable content, but also politically biased content.
arXiv Detail & Related papers (2022-04-11T07:06:01Z)
Viable Threat on News Reading: Generating Biased News Using Natural Language Models [49.90665530780664]
We show that publicly available language models can reliably generate biased news content based on an input original news. We also show that a large number of high-quality biased news articles can be generated using controllable text generation.
arXiv Detail & Related papers (2020-10-05T16:55:39Z)
Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
TextDecepter: Hard Label Black Box Attack on Text Classifiers [0.0]
We present a novel approach for hard-label black-box attacks against Natural Language Processing (NLP) classifiers. Such an attack scenario applies to real-world black-box models being used for security-sensitive applications such as sentiment analysis and toxic content detection.
arXiv Detail & Related papers (2020-08-16T08:57:01Z)
Hate Speech Detection and Racial Bias Mitigation in Social Media based on BERT model [1.9336815376402716]
We introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT. We evaluate the proposed model on two publicly available datasets annotated for racism, sexism, hate or offensive content on Twitter.
arXiv Detail & Related papers (2020-08-14T16:47:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.