Gender Bias in Big Data Analysis
- URL: http://arxiv.org/abs/2211.09865v1
- Date: Thu, 17 Nov 2022 20:13:04 GMT
- Title: Gender Bias in Big Data Analysis
- Authors: Thomas J. Misa
- Abstract summary: It measures gender bias when gender prediction software tools are used in historical big data research.
Gender bias is measured by contrasting personally identified computer science authors in the well-regarded DBLP dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This article combines humanistic "data critique" with informed inspection of
big data analysis. It measures gender bias when gender prediction software
tools (Gender API, Namsor, and Genderize.io) are used in historical big data
research. Gender bias is measured by contrasting personally identified computer
science authors in the well-regarded DBLP dataset (1950-1980) with exactly
comparable results from the software tools. Implications for public
understanding of gender bias in computing and the nature of the computing
profession are outlined. Preliminary assessment of the Semantic Scholar dataset
is presented. The conclusion combines humanistic approaches with selective use
of big data methods.
Related papers
- The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and
Addressing Sociological Implications [0.0]
The study examines existing research on gender bias in AI language models and identifies gaps in the current knowledge.
The findings shed light on gendered word associations, language usage, and biased narratives present in the outputs of Large Language Models.
The paper presents strategies for reducing gender bias in LLMs, including algorithmic approaches and data augmentation techniques.
arXiv Detail & Related papers (2023-07-18T11:38:45Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - Dynamics of Gender Bias in Computing [0.0]
This article presents a new dataset focusing on formative years of computing as a profession (1950-1980) when U.S. government workforce statistics are thin or non-existent.
It revises commonly held conjectures that gender bias in computing emerged during professionalization of computer science in the 1960s or 1970s.
arXiv Detail & Related papers (2022-11-07T23:29:56Z) - Evaluating Gender Bias in Natural Language Inference [5.034017602990175]
We propose an evaluation methodology to measure gender bias in natural language understanding through inference.
We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations.
Our findings suggest that three models trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors.
arXiv Detail & Related papers (2021-05-12T09:41:51Z) - Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by
Ranking Algorithms [68.85295025020942]
We propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a Search Engines to support gender stereotypes.
GSR is the first specifically tailored measure for Information Retrieval, capable of quantifying representational harms.
arXiv Detail & Related papers (2020-09-02T20:45:04Z) - Mitigating Gender Bias in Machine Learning Data Sets [5.075506385456811]
Gender bias has been identified in the context of employment advertising and recruitment tools.
This paper proposes a framework for the identification of gender bias in training data for machine learning.
arXiv Detail & Related papers (2020-05-14T12:06:02Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.