No Word Embedding Model Is Perfect: Evaluating the Representation
Accuracy for Social Bias in the Media
- URL: http://arxiv.org/abs/2211.03634v1
- Date: Mon, 7 Nov 2022 15:45:52 GMT
- Title: No Word Embedding Model Is Perfect: Evaluating the Representation
Accuracy for Social Bias in the Media
- Authors: Maximilian Splieth\"over, Maximilian Keiff, Henning Wachsmuth
- Abstract summary: We study what kind of embedding algorithm serves best to accurately measure types of social bias known to exist in US online news articles.
We collect 500k articles and review psychology literature with respect to expected social bias.
We compare how models trained with the algorithms on news articles represent the expected social bias.
- Score: 17.4812995898078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: News articles both shape and reflect public opinion across the political
spectrum. Analyzing them for social bias can thus provide valuable insights,
such as prevailing stereotypes in society and the media, which are often
adopted by NLP models trained on respective data. Recent work has relied on
word embedding bias measures, such as WEAT. However, several representation
issues of embeddings can harm the measures' accuracy, including low-resource
settings and token frequency differences. In this work, we study what kind of
embedding algorithm serves best to accurately measure types of social bias
known to exist in US online news articles. To cover the whole spectrum of
political bias in the US, we collect 500k articles and review psychology
literature with respect to expected social bias. We then quantify social bias
using WEAT along with embedding algorithms that account for the aforementioned
issues. We compare how models trained with the algorithms on news articles
represent the expected social bias. Our results suggest that the standard way
to quantify bias does not align well with knowledge from psychology. While the
proposed algorithms reduce the~gap, they still do not fully match the
literature.
Related papers
- DocNet: Semantic Structure in Inductive Bias Detection Models [0.4779196219827508]
In this paper, we explore an often overlooked aspect of bias detection in documents: the semantic structure of news articles.
We present DocNet, a novel, inductive, and low-resource document embedding and bias detection model.
We also demonstrate that the semantic structure of news articles from opposing partisan sides, as represented in document-level graph embeddings, have significant similarities.
arXiv Detail & Related papers (2024-06-16T14:51:12Z) - A Principled Approach for a New Bias Measure [7.352247786388098]
We propose the definition of Uniform Bias (UB), the first bias measure with a clear and simple interpretation in the full range of bias values.
Our results are experimentally validated using nine publicly available datasets and theoretically analyzed, which provide novel insights about the problem.
Based on our approach, we also design a bias mitigation model that might be useful to policymakers.
arXiv Detail & Related papers (2024-05-20T18:14:33Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases.
We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions.
Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z) - The Tail Wagging the Dog: Dataset Construction Biases of Social Bias
Benchmarks [75.58692290694452]
We compare social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye.
We observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models.
arXiv Detail & Related papers (2022-10-18T17:58:39Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Argument from Old Man's View: Assessing Social Bias in Argumentation [20.65183968971417]
Social bias in language poses a problem with ethical impact for many NLP applications.
Recent research has shown that machine learning models trained on respective data may not only adopt, but even amplify the bias.
We study the existence of social biases in large English debate portals.
arXiv Detail & Related papers (2020-11-24T10:39:44Z) - "Thy algorithm shalt not bear false witness": An Evaluation of
Multiclass Debiasing Methods on Word Embeddings [3.0204693431381515]
The paper investigates the state-of-the-art multiclass debiasing techniques: Hard debiasing, SoftWEAT debiasing and Conceptor debiasing.
It evaluates their performance when removing religious bias on a common basis by quantifying bias removal via the Word Embedding Association Test (WEAT), Mean Average Cosine Similarity (MAC) and the Relative Negative Sentiment Bias (RNSB)
arXiv Detail & Related papers (2020-10-30T12:49:39Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.