BAD: BiAs Detection for Large Language Models in the context of
candidate screening
- URL: http://arxiv.org/abs/2305.10407v1
- Date: Wed, 17 May 2023 17:47:31 GMT
- Title: BAD: BiAs Detection for Large Language Models in the context of
candidate screening
- Authors: Nam Ho Koh, Joseph Plata, Joyce Chai
- Abstract summary: This project aims to quantify the instances of social bias in ChatGPT and other OpenAI LLMs in the context of candidate screening.
We will show how the use of these models could perpetuate existing biases and inequalities in the hiring process.
- Score: 6.47452771256903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Application Tracking Systems (ATS) have allowed talent managers, recruiters,
and college admissions committees to process large volumes of potential
candidate applications efficiently. Traditionally, this screening process was
conducted manually, creating major bottlenecks due to the quantity of
applications and introducing many instances of human bias. The advent of large
language models (LLMs) such as ChatGPT and the potential of adopting methods to
current automated application screening raises additional bias and fairness
issues that must be addressed. In this project, we wish to identify and
quantify the instances of social bias in ChatGPT and other OpenAI LLMs in the
context of candidate screening in order to demonstrate how the use of these
models could perpetuate existing biases and inequalities in the hiring process.
Related papers
- On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions.
We propose a novel method to address this challenge.
We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z) - Fairness in AI-Driven Recruitment: Challenges, Metrics, Methods, and Future Directions [0.0]
Big data and machine learning has led to a rapid transformation in the traditional recruitment process.
Given the prevalence of AI-based recruitment, there is growing concern that human biases may carry over to decisions made by these systems.
This paper provides a comprehensive overview of this emerging field by discussing the types of biases encountered in AI-driven recruitment.
arXiv Detail & Related papers (2024-05-30T05:25:14Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - LangBiTe: A Platform for Testing Bias in Large Language Models [1.9744907811058787]
Large Language Models (LLMs) are trained on a vast amount of data scrapped from forums, websites, social media and other internet sources.
LangBiTe enables development teams to tailor their test scenarios, and automatically generate and execute the test cases according to a set of user-defined ethical requirements.
LangBite provides users with the bias evaluation of LLMs, and end-to-end traceability between the initial ethical requirements and the insights obtained.
arXiv Detail & Related papers (2024-04-29T10:02:45Z) - Auditing the Use of Language Models to Guide Hiring Decisions [2.949890760187898]
Regulatory efforts to protect against algorithmic bias have taken on increased urgency with rapid advances in large language models.
Current regulations -- as well as the scientific literature -- provide little guidance on how to conduct these assessments.
Here we propose and investigate one approach for auditing algorithms: correspondence experiments.
arXiv Detail & Related papers (2024-04-03T22:01:26Z) - The Shifted and The Overlooked: A Task-oriented Investigation of
User-GPT Interactions [114.67699010359637]
We analyze a large-scale collection of real user queries to GPT.
We find that tasks such as design'' and planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks.
arXiv Detail & Related papers (2023-10-19T02:12:17Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with
a Focus on Candidate Response Distribution [38.58190457533888]
We introduce the task of candidate distribution matching, propose several evaluation metrics for the task, and demonstrate that automatic systems trained on RACE++ can be leveraged as baselines for our task.
We further demonstrate that these automatic systems can be used for practical pre-test evaluation tasks such as detecting underperforming distractors.
arXiv Detail & Related papers (2023-06-22T17:13:08Z) - Large Language Models are Not Yet Human-Level Evaluators for Abstractive
Summarization [66.08074487429477]
We investigate the stability and reliability of large language models (LLMs) as automatic evaluators for abstractive summarization.
We find that while ChatGPT and GPT-4 outperform the commonly used automatic metrics, they are not ready as human replacements.
arXiv Detail & Related papers (2023-05-22T14:58:13Z) - Large Language Models are Zero-Shot Rankers for Recommender Systems [76.02500186203929]
This work aims to investigate the capacity of large language models (LLMs) to act as the ranking model for recommender systems.
We show that LLMs have promising zero-shot ranking abilities but struggle to perceive the order of historical interactions.
We demonstrate that these issues can be alleviated using specially designed prompting and bootstrapping strategies.
arXiv Detail & Related papers (2023-05-15T17:57:39Z) - Toward a traceable, explainable, and fairJD/Resume recommendation system [10.820022470618234]
Development of an automatic recruitment system is still one of the main challenges.
Our aim is to explore how modern language models can be combined with knowledge bases and datasets to enhance the JD/Resume matching process.
arXiv Detail & Related papers (2022-02-02T18:17:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.