Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese
Media Bias Detection
- URL: http://arxiv.org/abs/2311.01372v2
- Date: Sat, 18 Nov 2023 09:45:01 GMT
- Title: Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese
Media Bias Detection
- Authors: Luyang Lin, Jing Li, Kam-Fai Wong
- Abstract summary: We build a dataset with Chinese news reports about COVID-19 which is annotated by our newly designed system.
In Data-Augmented Context Enrichment (DACE), we enlarge the training data; while in Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval methods to select valuable information.
Our results show that both methods outperform our baselines, while the RACE methods are more efficient and have more potential.
- Score: 16.343223974292908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing pursuit of objective reports, automatically understanding
media bias has drawn more attention in recent research. However, most of the
previous work examines media bias from Western ideology, such as the left and
right in the political spectrum, which is not applicable to Chinese outlets.
Based on the previous lexical bias and informational bias structure, we refine
it from the Chinese perspective and go one step further to craft data with 7
fine-grained labels. To be specific, we first construct a dataset with Chinese
news reports about COVID-19 which is annotated by our newly designed system,
and then conduct substantial experiments on it to detect media bias. However,
the scale of the annotated data is not enough for the latest deep-learning
technology, and the cost of human annotation in media bias, which needs a lot
of professional knowledge, is too expensive. Thus, we explore some context
enrichment methods to automatically improve these problems. In Data-Augmented
Context Enrichment (DACE), we enlarge the training data; while in
Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval
methods to select valuable information and integrate it into our models to
better understand bias. Extensive experiments are conducted on both our dataset
and an English dataset BASIL. Our results show that both methods outperform our
baselines, while the RACE methods are more efficient and have more potential.
Related papers
- Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Target-Aware Contextual Political Bias Detection in News [22.396285428304083]
Sentence-level political bias detection in news is a challenging task that requires an understanding of bias in consideration of the context.
Previous work in media bias detection has proposed augmentation techniques to exploit this fact.
We propose techniques to more carefully search for context using a bias-sensitive, target-aware approach for data augmentation.
arXiv Detail & Related papers (2023-10-02T12:25:05Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models.
The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control.
Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z) - Neural Media Bias Detection Using Distant Supervision With BABE -- Bias
Annotations By Experts [24.51774048437496]
This paper presents BABE, a robust and diverse data set for media bias research.
It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level.
Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically.
arXiv Detail & Related papers (2022-09-29T05:32:55Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Towards A Reliable Ground-Truth For Biased Language Detection [3.2202224129197745]
Existing methods to detect bias mostly rely on annotated data to train machine learning models.
We evaluate data collection options and compare labels obtained from two popular crowdsourcing platforms.
We conclude that detailed annotator training increases data quality, improving the performance of existing bias detection systems.
arXiv Detail & Related papers (2021-12-14T14:13:05Z) - Studying Up Machine Learning Data: Why Talk About Bias When We Mean
Power? [0.0]
We argue that reducing societal problems to "bias" misses the context-based nature of data.
We highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets.
arXiv Detail & Related papers (2021-09-16T17:38:26Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Context in Informational Bias Detection [4.386026071380442]
We explore four kinds of context for informational bias in English news articles.
We find that integrating event context improves classification performance over a very strong baseline.
We find that the best-performing context-inclusive model outperforms the baseline on longer sentences.
arXiv Detail & Related papers (2020-12-03T15:50:20Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.