Related papers: Gender Disparities in StackOverflow's Community-Based Question Answering: A Matter of Quantity versus Quality

Gender Disparities in StackOverflow's Community-Based Question Answering: A Matter of Quantity versus Quality

URL: http://arxiv.org/abs/2601.23063v1
Date: Fri, 30 Jan 2026 15:16:01 GMT
Title: Gender Disparities in StackOverflow's Community-Based Question Answering: A Matter of Quantity versus Quality
Authors: Maddalena Amendola, Cosimo Rulli, Carlos Castillo, Andrea Passarella, Raffaele Perego,
Abstract summary: We investigate whether answer quality is influenced by gender using a combination of human evaluations and automated assessments powered by Large Language Models.<n>Our findings reveal no significant gender differences in answer quality, nor any substantial influence of gender bias on the selection of best answers"<n>Our results have important implications for the design of scoring systems in community question-answering platforms.
Score: 7.02751685276625
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Community Question-Answering platforms, such as Stack Overflow (SO), are valuable knowledge exchange and problem-solving resources. These platforms incorporate mechanisms to assess the quality of answers and participants' expertise, ideally free from discriminatory biases. However, prior research has highlighted persistent gender biases, raising concerns about the inclusivity and fairness of these systems. Addressing such biases is crucial for fostering equitable online communities. While previous studies focus on detecting gender bias by comparing male and female user characteristics, they often overlook the interaction between genders, inherent answer quality, and the selection of ``best answers'' by question askers. In this study, we investigate whether answer quality is influenced by gender using a combination of human evaluations and automated assessments powered by Large Language Models. Our findings reveal no significant gender differences in answer quality, nor any substantial influence of gender bias on the selection of ``best answers." Instead, we find that the significant gender disparities in SO's reputation scores are primarily attributable to differences in users' activity levels, e.g., the number of questions and answers they write. Our results have important implications for the design of scoring systems in community question-answering platforms. In particular, reputation systems that heavily emphasize activity volume risk amplifying gender disparities that do not reflect actual differences in answer quality, calling for more equitable design strategies.

Related papers

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation [116.86965910589775]
We show that even minimal perturbations, such as masking just 10% of objects or weakly blurring backgrounds, can dramatically alter bias scores.<n>This suggests that current bias evaluations reflect model responses to spurious features rather than gender bias.
arXiv Detail & Related papers (2025-09-09T11:14:11Z)
Do LLMs have a Gender (Entropy) Bias? [3.2225437367979763]
We define and study entropy bias, which we define as a discrepancy in the amount of information generated by an LLM in response to real questions users have asked.<n>Our analyses suggest that there is no significant bias in LLM responses for men and women at a category level.<n>We suggest a simple debiasing approach that iteratively merges the responses for the two genders to produce a final result.
arXiv Detail & Related papers (2025-05-24T23:06:41Z)
Exploring Gender Disparities in Automatic Speech Recognition Technology [22.729651340592586]
We analyze how performance varies across different gender representations in training data.<n>Our findings suggest a complex interplay between the gender ratio in training data and ASR performance.
arXiv Detail & Related papers (2025-02-25T18:29:38Z)
The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [91.86718720024825]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.<n>Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.<n>We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.<n>GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
Understanding and Addressing Gender Bias in Expert Finding Task [6.239590365208578]
This study investigates gender bias in state-of-the-art Expert Finding (EF) models. Our findings reveal that models relying on reputation metrics and activity levels disproportionately favor male users. We propose adjustments to EF models that incorporate a more balanced preprocessing strategy and leverage content-based and social network-based information.
arXiv Detail & Related papers (2024-07-07T11:35:23Z)
Diverse, but Divisive: LLMs Can Exaggerate Gender Differences in Opinion Related to Harms of Misinformation [8.066880413153187]
This paper examines whether a large language model (LLM) can reflect the views of various groups when assessing the harms of misinformation. We present the TopicMisinfo dataset, containing 160 fact-checked claims from diverse topics. We find that GPT 3.5-Turbo reflects empirically observed gender differences in opinion but amplifies the extent of these differences.
arXiv Detail & Related papers (2024-01-29T20:50:28Z)
Covering Uncommon Ground: Gap-Focused Question Generation for Answer Assessment [75.59538732476346]
We focus on the problem of generating such gap-focused questions (GFQs) automatically. We define the task, highlight key desired aspects of a good GFQ, and propose a model that satisfies these.
arXiv Detail & Related papers (2023-07-06T22:21:42Z)
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z)
Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms [68.85295025020942]
We propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a Search Engines to support gender stereotypes. GSR is the first specifically tailored measure for Information Retrieval, capable of quantifying representational harms.
arXiv Detail & Related papers (2020-09-02T20:45:04Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
Review-guided Helpful Answer Identification in E-commerce [38.276241153439955]
Product-specific community question answering platforms can greatly help address the concerns of potential customers. The user-provided answers on such platforms often vary a lot in their qualities. Helpfulness votes from the community can indicate the overall quality of the answer, but they are often missing.
arXiv Detail & Related papers (2020-03-13T11:34:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.