Radar de Parit\'e: An NLP system to measure gender representation in
French news stories
- URL: http://arxiv.org/abs/2304.09982v1
- Date: Wed, 19 Apr 2023 21:33:59 GMT
- Title: Radar de Parit\'e: An NLP system to measure gender representation in
French news stories
- Authors: Valentin-Gabriel Soumah, Prashanth Rao, Philipp Eibl, Maite Taboada
- Abstract summary: Radar de Parit'e measures the proportion of women and men quoted daily in six Canadian French-language media outlets.
We outline the system's architecture and detail the challenges we overcame to address French-specific issues.
- Score: 0.05735035463793007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the Radar de Parit\'e, an automated Natural Language Processing
(NLP) system that measures the proportion of women and men quoted daily in six
Canadian French-language media outlets. We outline the system's architecture
and detail the challenges we overcame to address French-specific issues, in
particular regarding coreference resolution, a new contribution to the NLP
literature on French. We also showcase statistics covering over one year's
worth of data (282,512 news articles). Our results highlight the
underrepresentation of women in news stories, while also illustrating the
application of modern NLP methods to measure gender representation and address
societal issues.
Related papers
- GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis [1.4100823284870105]
This paper introduces a computational framework designed to delineate gender distribution biases in topics covered by French TV and radio news.
We transcribe a dataset of 11.7k hours, broadcasted in 2023 on 21 French channels.
We show that women are notably underrepresented in subjects such as sports, politics and conflicts.
arXiv Detail & Related papers (2024-07-19T10:15:45Z) - Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP [2.3499129784547663]
This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys.
Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks.
By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022.
arXiv Detail & Related papers (2024-07-13T12:01:52Z) - Leveraging Large Language Models to Measure Gender Bias in Gendered Languages [9.959039325564744]
This paper introduces a novel methodology that leverages the contextual understanding capabilities of large language models (LLMs) to quantitatively analyze gender representation in Spanish corpora.
We empirically validate our method on four widely-used benchmark datasets, uncovering significant gender disparities with a male-to-female ratio ranging from 4:01.
arXiv Detail & Related papers (2024-06-19T16:30:58Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - One Country, 700+ Languages: NLP Challenges for Underrepresented
Languages and Dialects in Indonesia [60.87739250251769]
We provide an overview of the current state of NLP research for Indonesia's 700+ languages.
We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems.
arXiv Detail & Related papers (2022-03-24T22:07:22Z) - GenderedNews: Une approche computationnelle des \'ecarts de
repr\'esentation des genres dans la presse fran\c{c}aise [0.0]
We present it GenderedNews (urlhttps://gendered-news.imag.fr), an online dashboard which gives weekly measures of gender imbalance in French online press.
We use Natural Language Processing (NLP) methods to quantify gender inequalities in the media.
We describe the data collected daily (seven main titles of French online news media) and the methodology behind our metrics.
arXiv Detail & Related papers (2022-02-11T15:16:49Z) - Generating Gender Augmented Data for NLP [3.5557219875516655]
Gender bias is a frequent occurrence in NLP-based applications, especially in gender-inflected languages.
This paper proposes an automatic and generalisable rewriting approach for short conversational sentences.
The proposed approach is based on a neural machine translation (NMT) system trained to 'translate' from one gender alternative to another.
arXiv Detail & Related papers (2021-07-13T11:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.