Unsupervised Bias Detection in College Student Newspapers
- URL: http://arxiv.org/abs/2309.06557v1
- Date: Mon, 11 Sep 2023 06:51:09 GMT
- Title: Unsupervised Bias Detection in College Student Newspapers
- Authors: Adam M. Lehavi, William McCormack, Noah Kornfeld and Solomon Glazer
- Abstract summary: This paper introduces a framework for scraping complex archive sites and generates a dataset of 14 student papers with 23,154 entries.
The advantages of this approach are that it is less comparative than reconstruction bias and requires less labelled data than generating keyword sentiment.
Results are calculated on politically charged words as well as control words to show how conclusions can be drawn.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents a pipeline with minimal human influence for scraping and
detecting bias on college newspaper archives. This paper introduces a framework
for scraping complex archive sites that automated tools fail to grab data from,
and subsequently generates a dataset of 14 student papers with 23,154 entries.
This data can also then be queried by keyword to calculate bias by comparing
the sentiment of a large language model summary to the original article. The
advantages of this approach are that it is less comparative than reconstruction
bias and requires less labelled data than generating keyword sentiment. Results
are calculated on politically charged words as well as control words to show
how conclusions can be drawn. The complete method facilitates the extraction of
nuanced insights with minimal assumptions and categorizations, paving the way
for a more objective understanding of bias within student newspaper sources.
Related papers
- Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese
Media Bias Detection [16.343223974292908]
We build a dataset with Chinese news reports about COVID-19 which is annotated by our newly designed system.
In Data-Augmented Context Enrichment (DACE), we enlarge the training data; while in Retrieval-Augmented Context Enrichment (RACE), we improve information retrieval methods to select valuable information.
Our results show that both methods outperform our baselines, while the RACE methods are more efficient and have more potential.
arXiv Detail & Related papers (2023-11-02T16:29:49Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Revisiting text decomposition methods for NLI-based factuality scoring
of summaries [9.044665059626958]
We show that fine-grained decomposition is not always a winning strategy for factuality scoring.
We also show that small changes to previously proposed entailment-based scoring methods can result in better performance.
arXiv Detail & Related papers (2022-11-30T09:54:37Z) - An Approach to Ensure Fairness in News Articles [1.2349542674006961]
This paper introduces Dbias, which is a Python package to ensure fairness in news articles.
Dbias is a trained Machine Learning pipeline that can take a text and detects if the text is biased or not.
We show in experiments that this pipeline can be effective for mitigating biases and outperforms the common neural network architectures.
arXiv Detail & Related papers (2022-07-08T14:43:56Z) - Towards A Reliable Ground-Truth For Biased Language Detection [3.2202224129197745]
Existing methods to detect bias mostly rely on annotated data to train machine learning models.
We evaluate data collection options and compare labels obtained from two popular crowdsourcing platforms.
We conclude that detailed annotator training increases data quality, improving the performance of existing bias detection systems.
arXiv Detail & Related papers (2021-12-14T14:13:05Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Context in Informational Bias Detection [4.386026071380442]
We explore four kinds of context for informational bias in English news articles.
We find that integrating event context improves classification performance over a very strong baseline.
We find that the best-performing context-inclusive model outperforms the baseline on longer sentences.
arXiv Detail & Related papers (2020-12-03T15:50:20Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Compressive Summarization with Plausibility and Salience Modeling [54.37665950633147]
We propose to relax the rigid syntactic constraints on candidate spans and instead leave compression decisions to two data-driven criteria: plausibility and salience.
Our method achieves strong in-domain results on benchmark summarization datasets, and human evaluation shows that the plausibility model generally selects for grammatical and factual deletions.
arXiv Detail & Related papers (2020-10-15T17:07:10Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.