Bias at a Second Glance: A Deep Dive into Bias for German Educational
Peer-Review Data Modeling
- URL: http://arxiv.org/abs/2209.10335v2
- Date: Thu, 22 Sep 2022 13:08:04 GMT
- Title: Bias at a Second Glance: A Deep Dive into Bias for German Educational
Peer-Review Data Modeling
- Authors: Thiemo Wambsganss, Vinitra Swamy, Roman Rietsche, Tanja K\"aser
- Abstract summary: We analyze bias across text and through multiple architectures on a corpus of 9,165 German peer- reviews over five years.
Our collected corpus does not reveal many biases in the co-occurrence analysis or in the GloVe embeddings.
Pre-trained German language models find substantial conceptual, racial, and gender bias.
- Score: 10.080007569933331
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural Language Processing (NLP) has become increasingly utilized to provide
adaptivity in educational applications. However, recent research has
highlighted a variety of biases in pre-trained language models. While existing
studies investigate bias in different domains, they are limited in addressing
fine-grained analysis on educational and multilingual corpora. In this work, we
analyze bias across text and through multiple architectures on a corpus of
9,165 German peer-reviews collected from university students over five years.
Notably, our corpus includes labels such as helpfulness, quality, and critical
aspect ratings from the peer-review recipient as well as demographic
attributes. We conduct a Word Embedding Association Test (WEAT) analysis on (1)
our collected corpus in connection with the clustered labels, (2) the most
common pre-trained German language models (T5, BERT, and GPT-2) and GloVe
embeddings, and (3) the language models after fine-tuning on our collected
data-set. In contrast to our initial expectations, we found that our collected
corpus does not reveal many biases in the co-occurrence analysis or in the
GloVe embeddings. However, the pre-trained German language models find
substantial conceptual, racial, and gender bias and have significant changes in
bias across conceptual and racial axes during fine-tuning on the peer-review
data. With our research, we aim to contribute to the fourth UN sustainability
goal (quality education) with a novel dataset, an understanding of biases in
natural language education data, and the potential harms of not counteracting
biases in language models for educational tasks.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - How Different Is Stereotypical Bias Across Languages? [1.0467550794914122]
Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models.
We make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish.
The main takeaways from our analysis are that mGPT-2 shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models.
arXiv Detail & Related papers (2023-07-14T13:17:11Z) - Evaluating Biased Attitude Associations of Language Models in an
Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology.
We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight.
We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z) - CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models.
The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control.
Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results.
We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z) - Towards an Enhanced Understanding of Bias in Pre-trained Neural Language
Models: A Survey with Special Emphasis on Affective Bias [2.6304695993930594]
We present a survey to comprehend bias in large pre-trained language models, analyze the stages at which they occur, and various ways in which these biases could be quantified and mitigated.
Considering wide applicability of textual affective computing based downstream tasks in real-world systems such as business, healthcare, education, etc., we give a special emphasis on investigating bias in the context of affect (emotion) i.e., Affective Bias.
We present a summary of various bias evaluation corpora that help to aid future research and discuss challenges in the research on bias in pre-trained language models.
arXiv Detail & Related papers (2022-04-21T18:51:19Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.