Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis
- URL: http://arxiv.org/abs/2407.14180v1
- Date: Fri, 19 Jul 2024 10:15:45 GMT
- Title: Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis
- Authors: Valentin Pelloin, Lena Dodson, Émile Chapuis, Nicolas Hervé, David Doukhan,
- Abstract summary: This paper introduces a computational framework designed to delineate gender distribution biases in topics covered by French TV and radio news.
We transcribe a dataset of 11.7k hours, broadcasted in 2023 on 21 French channels.
We show that women are notably underrepresented in subjects such as sports, politics and conflicts.
- Score: 1.4100823284870105
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces a computational framework designed to delineate gender distribution biases in topics covered by French TV and radio news. We transcribe a dataset of 11.7k hours, broadcasted in 2023 on 21 French channels. A Large Language Model (LLM) is used in few-shot conversation mode to obtain a topic classification on those transcriptions. Using the generated LLM annotations, we explore the finetuning of a specialized smaller classification model, to reduce the computational cost. To evaluate the performances of these models, we construct and annotate a dataset of 804 dialogues. This dataset is made available free of charge for research purposes. We show that women are notably underrepresented in subjects such as sports, politics and conflicts. Conversely, on topics such as weather, commercials and health, women have more speaking time than their overall average across all subjects. We also observe representations differences between private and public service channels.
Related papers
- Everyone deserves their voice to be heard: Analyzing Predictive Gender Bias in ASR Models Applied to Dutch Speech Data [13.91630413828167]
This study focuses on identifying the performance disparities of Whisper models on Dutch speech data.
We analyzed the word error rate, character error rate and a BERT-based semantic similarity across gender groups.
arXiv Detail & Related papers (2024-11-14T13:29:09Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses [1.1708479580628022]
This study investigates the relationship between automatic information extraction descriptors and manual analyses to describe gender representation disparities in TV and Radio.
Findings reveal systemic gender imbalances, with women underrepresented compared to men across all descriptors.
arXiv Detail & Related papers (2024-06-14T16:05:43Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - GenderedNews: Une approche computationnelle des \'ecarts de
repr\'esentation des genres dans la presse fran\c{c}aise [0.0]
We present it GenderedNews (urlhttps://gendered-news.imag.fr), an online dashboard which gives weekly measures of gender imbalance in French online press.
We use Natural Language Processing (NLP) methods to quantify gender inequalities in the media.
We describe the data collected daily (seven main titles of French online news media) and the methodology behind our metrics.
arXiv Detail & Related papers (2022-02-11T15:16:49Z) - Gender bias in magazines oriented to men and women: a computational
approach [58.720142291102135]
We compare the content of a women-oriented magazine with that of a men-oriented one, both produced by the same editorial group over a decade.
With Topic Modelling techniques we identify the main themes discussed in the magazines and quantify how much the presence of these topics differs between magazines over time.
Our results show that the frequency of appearance of the topics Family, Business and Women as sex objects, present an initial bias that tends to disappear over time.
arXiv Detail & Related papers (2020-11-24T14:02:49Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z) - VIOLIN: A Large-Scale Dataset for Video-and-Language Inference [103.7457132841367]
We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.
Given a video clip with subtitles aligned as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip.
A new large-scale dataset, named Violin (VIdeO-and-Language INference), is introduced for this task, which consists of 95,322 video-hypothesis pairs from 15,887 video clips.
arXiv Detail & Related papers (2020-03-25T20:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.