Identification of Biased Terms in News Articles by Comparison of
Outlet-specific Word Embeddings
- URL: http://arxiv.org/abs/2112.07384v1
- Date: Tue, 14 Dec 2021 13:23:49 GMT
- Title: Identification of Biased Terms in News Articles by Comparison of
Outlet-specific Word Embeddings
- Authors: Timo Spinde, Lada Rudnitckaia, Felix Hamborg, Bela Gipp
- Abstract summary: We train two word embedding models, one on texts of left-wing, the other on right-wing news outlets.
Our hypothesis is that a word's representations in both word embedding spaces are more similar for non-biased words than biased words.
This paper presents the first in-depth look at the context of bias words measured by word embeddings.
- Score: 9.379650501033465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Slanted news coverage, also called media bias, can heavily influence how news
consumers interpret and react to the news. To automatically identify biased
language, we present an exploratory approach that compares the context of
related words. We train two word embedding models, one on texts of left-wing,
the other on right-wing news outlets. Our hypothesis is that a word's
representations in both word embedding spaces are more similar for non-biased
words than biased words. The underlying idea is that the context of biased
words in different news outlets varies more strongly than the one of non-biased
words, since the perception of a word as being biased differs depending on its
context. While we do not find statistical significance to accept the
hypothesis, the results show the effectiveness of the approach. For example,
after a linear mapping of both word embeddings spaces, 31% of the words with
the largest distances potentially induce bias. To improve the results, we find
that the dataset needs to be significantly larger, and we derive further
methodology as future research direction. To our knowledge, this paper presents
the first in-depth look at the context of bias words measured by word
embeddings.
Related papers
- Mitigating Gender Bias in Contextual Word Embeddings [1.208453901299241]
We propose a novel objective function for Lipstick(Masked-Language Modeling) which largely mitigates the gender bias in contextual embeddings.
We also propose new methods for debiasing static embeddings and provide empirical proof via extensive analysis and experiments.
arXiv Detail & Related papers (2024-11-18T21:36:44Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Addressing Biases in the Texts using an End-to-End Pipeline Approach [0.0]
We propose a fair ML pipeline that takes a text as input and determines whether it contains biases and toxic content.
It suggests a set of new words by substituting the bi-ased words, the idea is to lessen the effects of those biases by replacing them with alternative words.
The results show that our proposed pipeline can de-tect, identify, and mitigate biases in social media data.
arXiv Detail & Related papers (2023-03-13T11:41:28Z) - Discovering and Mitigating Visual Biases through Keyword Explanation [66.71792624377069]
We propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords.
B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C.
B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet.
arXiv Detail & Related papers (2023-01-26T13:58:46Z) - Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases.
We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions.
Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z) - Lost in Context? On the Sense-wise Variance of Contextualized Word
Embeddings [11.475144702935568]
We quantify how much the contextualized embeddings of each word sense vary across contexts in typical pre-trained models.
We find that word representations are position-biased, where the first words in different contexts tend to be more similar.
arXiv Detail & Related papers (2022-08-20T12:27:25Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - "Thy algorithm shalt not bear false witness": An Evaluation of
Multiclass Debiasing Methods on Word Embeddings [3.0204693431381515]
The paper investigates the state-of-the-art multiclass debiasing techniques: Hard debiasing, SoftWEAT debiasing and Conceptor debiasing.
It evaluates their performance when removing religious bias on a common basis by quantifying bias removal via the Word Embedding Association Test (WEAT), Mean Average Cosine Similarity (MAC) and the Relative Negative Sentiment Bias (RNSB)
arXiv Detail & Related papers (2020-10-30T12:49:39Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace.
Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z) - Joint Multiclass Debiasing of Word Embeddings [5.1135133995376085]
We present a joint multiclass debiasing approach capable of debiasing multiple bias dimensions simultaneously.
We show that our concepts can both reduce or even completely eliminate bias, while maintaining meaningful relationships between vectors in word embeddings.
arXiv Detail & Related papers (2020-03-09T22:06:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.