Robust Quantification of Gender Disparity in Pre-Modern English
Literature using Natural Language Processing
- URL: http://arxiv.org/abs/2204.05872v1
- Date: Tue, 12 Apr 2022 15:11:22 GMT
- Title: Robust Quantification of Gender Disparity in Pre-Modern English
Literature using Natural Language Processing
- Authors: Akarsh Nagaraj and Mayank Kejriwal
- Abstract summary: We demonstrate the significant discrepancy between the prevalence of female characters and male characters in pre-modern literature.
The discrepancy seems to be relatively stable as we plot data over the decades in this century-long period.
We aim to carefully describe both the limitations and ethical caveats associated with this study, and others like it.
- Score: 8.185725740857594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research has continued to shed light on the extent and significance of gender
disparity in social, cultural and economic spheres. More recently,
computational tools from the Natural Language Processing (NLP) literature have
been proposed for measuring such disparity using relatively extensive datasets
and empirically rigorous methodologies. In this paper, we contribute to this
line of research by studying gender disparity, at scale, in copyright-expired
literary texts published in the pre-modern period (defined in this work as the
period ranging from the mid-nineteenth through the mid-twentieth century). One
of the challenges in using such tools is to ensure quality control, and by
extension, trustworthy statistical analysis. Another challenge is in using
materials and methods that are publicly available and have been established for
some time, both to ensure that they can be used and vetted in the future, and
also, to add confidence to the methodology itself. We present our solution to
addressing these challenges, and using multiple measures, demonstrate the
significant discrepancy between the prevalence of female characters and male
characters in pre-modern literature. The evidence suggests that the discrepancy
declines when the author is female. The discrepancy seems to be relatively
stable as we plot data over the decades in this century-long period. Finally,
we aim to carefully describe both the limitations and ethical caveats
associated with this study, and others like it.
Related papers
- Divided by discipline? A systematic literature review on the quantification of online sexism and misogyny using a semi-automated approach [1.1599570446840546]
We present a semi-automated way to narrow down the search results in the different phases of selection stage in the PRISMA flowchart.
We examine literature from computer science and the social sciences from 2012 to 2022.
We discuss the challenges and opportunities for future research dedicated to measuring online sexism and misogyny.
arXiv Detail & Related papers (2024-09-30T11:34:39Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - Leveraging Large Language Models to Measure Gender Bias in Gendered Languages [9.959039325564744]
This paper introduces a novel methodology that leverages the contextual understanding capabilities of large language models (LLMs) to quantitatively analyze gender representation in Spanish corpora.
We empirically validate our method on four widely-used benchmark datasets, uncovering significant gender disparities with a male-to-female ratio ranging from 4:01.
arXiv Detail & Related papers (2024-06-19T16:30:58Z) - Evaluation of Faithfulness Using the Longest Supported Subsequence [52.27522262537075]
We introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous of the claim that is supported by the context.
Using a new human-annotated dataset, we finetune a model to generate Longest Supported Subsequence (LSS)
Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.
arXiv Detail & Related papers (2023-08-23T14:18:44Z) - Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and
Addressing Sociological Implications [0.0]
The study examines existing research on gender bias in AI language models and identifies gaps in the current knowledge.
The findings shed light on gendered word associations, language usage, and biased narratives present in the outputs of Large Language Models.
The paper presents strategies for reducing gender bias in LLMs, including algorithmic approaches and data augmentation techniques.
arXiv Detail & Related papers (2023-07-18T11:38:45Z) - Measuring Intersectional Biases in Historical Documents [37.03904311548859]
We investigate the continuities and transformations of bias in historical newspapers published in the Caribbean during the colonial era (18th to 19th centuries)
Our analyses are performed along the axes of gender, race, and their intersection.
We find that there is a trade-off between the stability of the word embeddings and their compatibility with the historical dataset.
arXiv Detail & Related papers (2023-05-21T07:10:31Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Towards Understanding Gender-Seniority Compound Bias in Natural Language
Generation [64.65911758042914]
We investigate how seniority impacts the degree of gender bias exhibited in pretrained neural generation models.
Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains.
These results suggest that NLP applications built using GPT-2 may harm women in professional capacities.
arXiv Detail & Related papers (2022-05-19T20:05:02Z) - Evaluating Gender Bias in Natural Language Inference [5.034017602990175]
We propose an evaluation methodology to measure gender bias in natural language understanding through inference.
We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations.
Our findings suggest that three models trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors.
arXiv Detail & Related papers (2021-05-12T09:41:51Z) - Gender bias in magazines oriented to men and women: a computational
approach [58.720142291102135]
We compare the content of a women-oriented magazine with that of a men-oriented one, both produced by the same editorial group over a decade.
With Topic Modelling techniques we identify the main themes discussed in the magazines and quantify how much the presence of these topics differs between magazines over time.
Our results show that the frequency of appearance of the topics Family, Business and Women as sex objects, present an initial bias that tends to disappear over time.
arXiv Detail & Related papers (2020-11-24T14:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.