Nigerian Software Engineer or American Data Scientist? GitHub Profile Recruitment Bias in Large Language Models
- URL: http://arxiv.org/abs/2409.12544v2
- Date: Tue, 14 Jan 2025 05:20:53 GMT
- Title: Nigerian Software Engineer or American Data Scientist? GitHub Profile Recruitment Bias in Large Language Models
- Authors: Takashi Nakano, Kazumasa Shimari, Raula Gaikovina Kula, Christoph Treude, Marc Cheong, Kenichi Matsumoto,
- Abstract summary: We use OpenAI's ChatGPT to conduct an initial set of experiments using GitHub User Profiles from four regions to recruit a six-person software development team.
Results indicate that ChatGPT shows preference for some regions over others, even when swapping the location strings of two profiles.
ChatGPT was more likely to assign certain developer roles to users from a specific country, revealing an implicit bias.
- Score: 9.040645392561196
- License:
- Abstract: Large Language Models (LLMs) have taken the world by storm, demonstrating their ability not only to automate tedious tasks, but also to show some degree of proficiency in completing software engineering tasks. A key concern with LLMs is their "black-box" nature, which obscures their internal workings and could lead to societal biases in their outputs. In the software engineering context, in this early results paper, we empirically explore how well LLMs can automate recruitment tasks for a geographically diverse software team. We use OpenAI's ChatGPT to conduct an initial set of experiments using GitHub User Profiles from four regions to recruit a six-person software development team, analyzing a total of 3,657 profiles over a five-year period (2019-2023). Results indicate that ChatGPT shows preference for some regions over others, even when swapping the location strings of two profiles (counterfactuals). Furthermore, ChatGPT was more likely to assign certain developer roles to users from a specific country, revealing an implicit bias. Overall, this study reveals insights into the inner workings of LLMs and has implications for mitigating such societal biases in these models.
Related papers
- What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs [9.007321855123882]
This study investigates how OpenAI's GPT-4 and Microsoft Copilot can reinforce gender and racial stereotypes.
We used each LLM to generate 300 profiles, consisting of 100 gender-based and 50 gender-neutral profiles.
Our analysis reveals that both models preferred male and Caucasian profiles, particularly for senior roles.
arXiv Detail & Related papers (2025-01-07T06:44:41Z) - LLMs are Imperfect, Then What? An Empirical Study on LLM Failures in Software Engineering [38.20696656193963]
We conducted an observational study with 22 participants using ChatGPT as a coding assistant in a non-trivial software engineering task.
We identified the cases where ChatGPT failed, their root causes, and the corresponding mitigation solutions used by users.
arXiv Detail & Related papers (2024-11-15T03:29:41Z) - Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study [17.952085678503362]
Large Language Models (LLMs) have gained significant attention in the software engineering community.
We mine 1,501 commits, pull requests, and issues from open-source projects by matching regular expressions likely to indicate the usage of ChatGPT to accomplish the task.
This resulted in a taxonomy of 45 tasks which developers automate via ChatGPT.
arXiv Detail & Related papers (2024-02-26T10:58:51Z) - Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement [75.7148545929689]
Large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others.
We formally define LLM's self-bias - the tendency to favor its own generation.
We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks.
arXiv Detail & Related papers (2024-02-18T03:10:39Z) - ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
Catching up? [71.12709925152784]
ChatGPT has brought a seismic shift in the entire landscape of AI.
It showed that a model could answer human questions and follow instructions on a broad panel of tasks.
While closed-source LLMs generally outperform their open-source counterparts, the progress on the latter has been rapid.
This has crucial implications not only on research but also on business.
arXiv Detail & Related papers (2023-11-28T17:44:51Z) - SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence
Understanding [103.34092301324425]
Large language models (LLMs) have shown impressive ability for open-domain NLP tasks.
We present SeqGPT, a bilingual (i.e., English and Chinese) open-source autoregressive model specially enhanced for open-domain natural language understanding.
arXiv Detail & Related papers (2023-08-21T07:31:19Z) - Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code? [10.249771123421432]
We investigate whether Large Language Models (LLMs) attend to the same parts of a task description as human programmers during code generation.
We manually analyzed 211 incorrect code snippets and found five attention patterns that can be used to explain many code generation errors.
Our findings highlight the need for human-aligned LLMs for better interpretability and programmer trust.
arXiv Detail & Related papers (2023-06-02T00:57:03Z) - Analysis of ChatGPT on Source Code [1.3381749415517021]
This paper explores the use of Large Language Models (LLMs) and in particular ChatGPT in programming, source code analysis, and code generation.
LLMs and ChatGPT are built using machine learning and artificial intelligence techniques, and they offer several benefits to developers and programmers.
arXiv Detail & Related papers (2023-06-01T12:12:59Z) - Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents [53.78782375511531]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.
This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)
To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.
To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z) - Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data)
This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.