Evaluation of LLMs Biases Towards Elite Universities: A Persona-Based Exploration
- URL: http://arxiv.org/abs/2407.12801v2
- Date: Sun, 21 Jul 2024 23:23:13 GMT
- Title: Evaluation of LLMs Biases Towards Elite Universities: A Persona-Based Exploration
- Authors: Shailja Gupta, Rajesh Ranjan,
- Abstract summary: This study investigates whether popular LLMs exhibit bias towards elite universities when generating personas for technology industry professionals.
We generated 432 personas across GPT-3.5, Gemini, and Claude 3 Sonnet with actual data from LinkedIn.
Results showed that LLMs significantly overrepresented elite universities, with 72.45% of generated personas featuring these institutions, compared to only 8.56% in the actual LinkedIn data.
This research highlights the need to address educational bias in LLMs and suggests strategies for mitigating such biases in AI-driven recruitment processes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Elite universities are a dream destination for not just students but also top employers who get a supply of amazing talents. When we hear about top universities, the first thing that comes to mind is their academic rigor, prestigious reputation, and highly successful alumni. However, society at large is not just represented by a few elite universities, but several others. We have seen several examples where many, even without formal education, built big businesses. There are various instances in which several people, however talented, couldn't make it to top elite universities because of several resource constraints. For recruitment of candidates, we do see candidates from a few elite universities well represented in top technology companies. However, we found during our study that LLMs go overboard in representing that. This study investigates whether popular LLMs exhibit bias towards elite universities when generating personas for technology industry professionals. We employed a novel persona-based approach to compare the educational background predictions of GPT-3.5, Gemini, and Claude 3 Sonnet with actual data from LinkedIn. The study focused on various roles at Microsoft, Meta, and Google, including VP Product, Director of Engineering, and Software Engineer. We generated 432 personas across the three LLMs and analyzed the frequency of elite universities (Stanford, MIT, UC Berkeley, and Harvard) in these personas compared to LinkedIn data. Results showed that LLMs significantly overrepresented elite universities, with 72.45% of generated personas featuring these institutions, compared to only 8.56% in the actual LinkedIn data. ChatGPT 3.5 exhibited the highest bias, followed by Claude Sonnet 3, while Gemini performed best. This research highlights the need to address educational bias in LLMs and suggests strategies for mitigating such biases in AI-driven recruitment processes.
Related papers
- Open-Source LLMs Collaboration Beats Closed-Source LLMs: A Scalable Multi-Agent System [51.04535721779685]
This paper aims to demonstrate the potential and strengths of open-source collectives.<n>We propose SMACS, a scalable multi-agent collaboration system (MACS) framework with high performance.<n> Experiments on eight mainstream benchmarks validate the effectiveness of our SMACS.
arXiv Detail & Related papers (2025-07-14T16:17:11Z) - Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations [2.548716674644006]
This paper evaluates the performance of six open-weight LLMs in recommending experts in physics across five tasks.<n>The evaluation examines consistency, factuality, and biases related to gender, ethnicity, academic popularity, and scholar similarity.
arXiv Detail & Related papers (2025-05-29T20:11:11Z) - ArxivBench: Can LLMs Assist Researchers in Conducting Research? [6.586119023242877]
Large language models (LLMs) have demonstrated remarkable effectiveness in completing various tasks such as reasoning, translation, and question answering.
In this study, we evaluate both proprietary and open-source LLMs on their ability to respond with relevant research papers and accurate links to articles hosted on the arXiv platform.
Our findings reveal a concerning accuracy of LLM-generated responses depending on the subject, with some subjects experiencing significantly lower accuracy than others.
arXiv Detail & Related papers (2025-04-06T05:00:10Z) - What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs [9.007321855123882]
This study investigates how OpenAI's GPT-4 and Microsoft Copilot can reinforce gender and racial stereotypes.
We used each LLM to generate 300 profiles, consisting of 100 gender-based and 50 gender-neutral profiles.
Our analysis reveals that both models preferred male and Caucasian profiles, particularly for senior roles.
arXiv Detail & Related papers (2025-01-07T06:44:41Z) - Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs [15.432107289828194]
Large language models (LLMs) are widely used but raise ethical concerns due to embedded social biases.
This study examines LLM biases against Arabs versus Westerners across eight domains, including women's rights, terrorism, and anti-Semitism.
We evaluate six LLMs -- GPT-4, GPT-4o, LlaMA 3.1 (8B & 405B), Mistral 7B, and Claude 3.5 Sonnet.
arXiv Detail & Related papers (2024-10-31T15:45:23Z) - Nigerian Software Engineer or American Data Scientist? GitHub Profile Recruitment Bias in Large Language Models [9.040645392561196]
We use OpenAI's ChatGPT to conduct an initial set of experiments using GitHub User Profiles from four regions to recruit a six-person software development team.
Results indicate that ChatGPT shows preference for some regions over others, even when swapping the location strings of two profiles.
ChatGPT was more likely to assign certain developer roles to users from a specific country, revealing an implicit bias.
arXiv Detail & Related papers (2024-09-19T08:04:30Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement [75.7148545929689]
Large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others.
We formally define LLM's self-bias - the tendency to favor its own generation.
We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks.
arXiv Detail & Related papers (2024-02-18T03:10:39Z) - LLM360: Towards Fully Transparent Open-Source LLMs [89.05970416013403]
The goal of LLM360 is to support open and collaborative AI research by making the end-to-end training process transparent and reproducible by everyone.
As a first step of LLM360, we release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses.
arXiv Detail & Related papers (2023-12-11T17:39:00Z) - Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks.
Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z) - Are Emily and Greg Still More Employable than Lakisha and Jamal?
Investigating Algorithmic Hiring Bias in the Era of ChatGPT [24.496590819263865]
Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks.
We evaluate LLMs on two tasks: (1) matching resumes to job categories; and (2) summarizing resumes with employment relevant information.
Overall, LLMs are robust across race and gender. They differ in their performance on pregnancy status and political affiliation.
arXiv Detail & Related papers (2023-10-08T12:08:48Z) - Is ChatGPT Good at Search? Investigating Large Language Models as
Re-Ranking Agents [56.104476412839944]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks.
This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR)
To address concerns about data contamination of LLMs, we collect a new test set called NovelEval.
To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z) - Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data)
This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.