Related papers: Hollywood Identity Bias Dataset: A Context Oriented Bias Analysis of Movie Dialogues

Hollywood Identity Bias Dataset: A Context Oriented Bias Analysis of Movie Dialogues

URL: http://arxiv.org/abs/2205.15951v2
Date: Wed, 1 Jun 2022 05:43:53 GMT
Title: Hollywood Identity Bias Dataset: A Context Oriented Bias Analysis of Movie Dialogues
Authors: Sandhya Singh, Prapti Roy, Nihar Sahoo, Niteesh Mallela, Himanshu Gupta, Pushpak Bhattacharyya, Milind Savagaonkar, Nidhi, Roshni Ramnani, Anutosh Maitra, Shubhashis Sengupta
Abstract summary: Social biases and stereotypes present in movies can cause extensive damage due to their reach. We introduce a new dataset of movie scripts that are annotated for identity bias. The dataset contains dialogue turns annotated for (i) bias labels for seven categories, viz., gender, race/ethnicity, religion, age, occupation, LGBTQ, and other.
Score: 20.222820874864748
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Movies reflect society and also hold power to transform opinions. Social biases and stereotypes present in movies can cause extensive damage due to their reach. These biases are not always found to be the need of storyline but can creep in as the author's bias. Movie production houses would prefer to ascertain that the bias present in a script is the story's demand. Today, when deep learning models can give human-level accuracy in multiple tasks, having an AI solution to identify the biases present in the script at the writing stage can help them avoid the inconvenience of stalled release, lawsuits, etc. Since AI solutions are data intensive and there exists no domain specific data to address the problem of biases in scripts, we introduce a new dataset of movie scripts that are annotated for identity bias. The dataset contains dialogue turns annotated for (i) bias labels for seven categories, viz., gender, race/ethnicity, religion, age, occupation, LGBTQ, and other, which contains biases like body shaming, personality bias, etc. (ii) labels for sensitivity, stereotype, sentiment, emotion, emotion intensity, (iii) all labels annotated with context awareness, (iv) target groups and reason for bias labels and (v) expert-driven group-validation process for high quality annotations. We also report various baseline performances for bias identification and category detection on our dataset.

Related papers

Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions [12.588239777597847]
Media bias significantly shapes public perception by reinforcing stereotypes and exacerbating societal divisions. We introduce a novel dataset collected from YouTube and Reddit over the past five years. Our dataset includes automated annotations for YouTube content across a broad spectrum of bias dimensions.
arXiv Detail & Related papers (2024-08-27T21:03:42Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text [11.085070600065801]
Language serves as a powerful tool for the manifestation of societal belief systems. Gender bias is one of the most pervasive biases in our society. We create the first dataset of GPT-generated English text with normative ratings of gender bias.
arXiv Detail & Related papers (2023-10-26T14:34:06Z)
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z)
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z)
Unveiling the Hidden Agenda: Biases in News Reporting and Consumption [59.55900146668931]
We build a six-year dataset on the Italian vaccine debate and adopt a Bayesian latent space model to identify narrative and selection biases. We found a nonlinear relationship between biases and engagement, with higher engagement for extreme positions. Analysis of news consumption on Twitter reveals common audiences among news outlets with similar ideological positions.
arXiv Detail & Related papers (2023-01-14T18:58:42Z)
Detecting Unintended Social Bias in Toxic Language Datasets [32.724030288421474]
This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named "Jigsaw Unintended Bias in Toxicity Classification" The dataset contains instances annotated for five different bias categories, viz., gender, race/ethnicity, religion, political, and LGBTQ. We train transformer-based models using our curated datasets and report baseline performance for bias identification, target generation, and bias implications.
arXiv Detail & Related papers (2022-10-21T06:50:12Z)
Neural Media Bias Detection Using Distant Supervision With BABE -- Bias Annotations By Experts [24.51774048437496]
This paper presents BABE, a robust and diverse data set for media bias research. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level. Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically.
arXiv Detail & Related papers (2022-09-29T05:32:55Z)
PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction [62.46299488465803]
We formulate a new revision task that aims to rewrite a given text to correct the implicit and potentially undesirable bias in character portrayals. We introduce PowerTransformer as an approach that debiases text through the lens of connotation frames. We demonstrate that our approach outperforms ablations and existing methods from related tasks.
arXiv Detail & Related papers (2020-10-26T18:05:48Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
Detection and Mitigation of Bias in Ted Talk Ratings [3.3598755777055374]
Implicit bias is a behavioral conditioning that leads us to attribute predetermined characteristics to members of certain groups. This paper quantifies implicit bias in viewer ratings of TEDTalks, a diverse social platform assessing social and professional performance.
arXiv Detail & Related papers (2020-03-02T06:13:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.