Related papers: Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese Media Bias Detection

Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese Media Bias Detection

URL: http://arxiv.org/abs/2311.01372v2
Date: Sat, 18 Nov 2023 09:45:01 GMT
Title: Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese Media Bias Detection
Authors: Luyang Lin, Jing Li, Kam-Fai Wong
Abstract summary: We build a dataset with Chinese news reports about COVID-19 which is annotated by our newly designed system. In Data-Augmented Context Enrichment (DACE), we enlarge the training data; while in Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval methods to select valuable information. Our results show that both methods outperform our baselines, while the RACE methods are more efficient and have more potential.
Score: 16.343223974292908
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing pursuit of objective reports, automatically understanding media bias has drawn more attention in recent research. However, most of the previous work examines media bias from Western ideology, such as the left and right in the political spectrum, which is not applicable to Chinese outlets. Based on the previous lexical bias and informational bias structure, we refine it from the Chinese perspective and go one step further to craft data with 7 fine-grained labels. To be specific, we first construct a dataset with Chinese news reports about COVID-19 which is annotated by our newly designed system, and then conduct substantial experiments on it to detect media bias. However, the scale of the annotated data is not enough for the latest deep-learning technology, and the cost of human annotation in media bias, which needs a lot of professional knowledge, is too expensive. Thus, we explore some context enrichment methods to automatically improve these problems. In Data-Augmented Context Enrichment (DACE), we enlarge the training data; while in Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval methods to select valuable information and integrate it into our models to better understand bias. Extensive experiments are conducted on both our dataset and an English dataset BASIL. Our results show that both methods outperform our baselines, while the RACE methods are more efficient and have more potential.

Related papers

Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts [29.95198868148809]
We propose a novel methodology that emulates the criteria that professional fact-checkers use to assess the factuality and political bias of an entire outlet.<n>We provide an in-depth error analysis of the effect of media popularity and region on model performance.
arXiv Detail & Related papers (2025-06-14T15:49:20Z)
To Bias or Not to Bias: Detecting bias in News with bias-detector [1.8024397171920885]
We perform sentence-level bias classification by fine-tuning a RoBERTa-based model on the expert-annotated BABE dataset.<n>We show statistically significant improvements in performance when comparing our model to a domain-adaptively pre-trained DA-RoBERTa baseline.<n>Our findings contribute to building more robust, explainable, and socially responsible NLP systems for media bias detection.
arXiv Detail & Related papers (2025-05-19T11:54:39Z)
Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation. We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process. We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z)
Target-Aware Contextual Political Bias Detection in News [22.396285428304083]
Sentence-level political bias detection in news is a challenging task that requires an understanding of bias in consideration of the context. Previous work in media bias detection has proposed augmentation techniques to exploit this fact. We propose techniques to more carefully search for context using a bias-sensitive, target-aware approach for data augmentation.
arXiv Detail & Related papers (2023-10-02T12:25:05Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z)
Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts [24.51774048437496]
This paper presents BABE, a robust and diverse data set for media bias research. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level. Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically.
arXiv Detail & Related papers (2022-09-29T05:32:55Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)
Towards A Reliable Ground-Truth For Biased Language Detection [3.2202224129197745]
Existing methods to detect bias mostly rely on annotated data to train machine learning models. We evaluate data collection options and compare labels obtained from two popular crowdsourcing platforms. We conclude that detailed annotator training increases data quality, improving the performance of existing bias detection systems.
arXiv Detail & Related papers (2021-12-14T14:13:05Z)
Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power? [0.0]
We argue that reducing societal problems to "bias" misses the context-based nature of data. We highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets.
arXiv Detail & Related papers (2021-09-16T17:38:26Z)
On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice. By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data. We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z)
Context in Informational Bias Detection [4.386026071380442]
We explore four kinds of context for informational bias in English news articles. We find that integrating event context improves classification performance over a very strong baseline. We find that the best-performing context-inclusive model outperforms the baseline on longer sentences.
arXiv Detail & Related papers (2020-12-03T15:50:20Z)
REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset. It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.