Fairness of ChatGPT
- URL: http://arxiv.org/abs/2305.18569v2
- Date: Sun, 5 May 2024 19:15:30 GMT
- Title: Fairness of ChatGPT
- Authors: Yunqi Li, Lanjing Zhang, Yongfeng Zhang,
- Abstract summary: This work aims to provide a systematic evaluation of the effectiveness and fairness of LLMs using ChatGPT as a study case.
We focus on assessing ChatGPT's performance in high-takes fields including education, criminology, finance and healthcare.
This work contributes to a deeper understanding of LLMs' fairness performance, facilitates bias mitigation and fosters the development of responsible AI systems.
- Score: 30.969927447499405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding and addressing unfairness in LLMs are crucial for responsible AI deployment. However, there is a limited number of quantitative analyses and in-depth studies regarding fairness evaluations in LLMs, especially when applying LLMs to high-stakes fields. This work aims to fill this gap by providing a systematic evaluation of the effectiveness and fairness of LLMs using ChatGPT as a study case. We focus on assessing ChatGPT's performance in high-takes fields including education, criminology, finance and healthcare. To conduct a thorough evaluation, we consider both group fairness and individual fairness metrics. We also observe the disparities in ChatGPT's outputs under a set of biased or unbiased prompts. This work contributes to a deeper understanding of LLMs' fairness performance, facilitates bias mitigation and fosters the development of responsible AI systems.
Related papers
- Once Upon a Team: Investigating Bias in LLM-Driven Software Team Composition and Task Allocation [48.2168236140771]
This study investigates whether LLMs exhibit bias in team composition and task assignment.<n>Using three LLMs and 3,000 simulated decisions, we find systematic disparities.<n>Our findings indicate that LLMs exacerbate demographic inequities in software engineering contexts.
arXiv Detail & Related papers (2026-01-07T12:13:22Z) - The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation [73.16564415490113]
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant document from external knowledge sources.
We propose two approaches, FairFT and FairFilter, to mitigate the fairness issues introduced by RAG for small-scale LLMs.
arXiv Detail & Related papers (2025-04-11T10:17:10Z) - Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge [84.34545223897578]
Despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility.
We identify 12 key potential biases and propose a new automated bias quantification framework-CALM- which quantifies and analyzes each type of bias in LLM-as-a-Judge.
Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.
arXiv Detail & Related papers (2024-10-03T17:53:30Z) - Fairness in Large Language Models in Three Hours [2.443957114877221]
This tutorial provides a systematic overview of recent advances in the literature concerning large language models.
The concept of fairness in LLMs is then explored, summarizing the strategies for evaluating bias and the algorithms designed to promote fairness.
arXiv Detail & Related papers (2024-08-02T03:44:14Z) - The Impossibility of Fair LLMs [59.424918263776284]
The need for fair AI is increasingly clear in the era of large language models (LLMs)
We review the technical frameworks that machine learning researchers have used to evaluate fairness.
We develop guidelines for the more realistic goal of achieving fairness in particular use cases.
arXiv Detail & Related papers (2024-05-28T04:36:15Z) - Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework [75.81096662788254]
Large Language Models (LLMs) are scalable and economical evaluators.
The question of how reliable these evaluators are has emerged as a crucial research question.
We propose Decompose and Aggregate, which breaks down the evaluation process into different stages based on pedagogical practices.
arXiv Detail & Related papers (2024-05-24T08:12:30Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers [27.66626125248612]
This paper presents an empirical study evaluating Large Language Models (LLMs) using the TREC Fair Ranking dataset.
We focus on the representation of binary protected attributes such as gender and geographic location, which are historically underrepresented in search outcomes.
Our analysis delves into how these LLMs handle queries and documents related to these attributes, aiming to uncover biases in their ranking algorithms.
arXiv Detail & Related papers (2024-04-04T04:23:19Z) - Fairness in Large Language Models: A Taxonomic Survey [2.669847575321326]
Large Language Models (LLMs) have demonstrated remarkable success across various domains.
Despite their promising performance in numerous real-world applications, most of these algorithms lack fairness considerations.
arXiv Detail & Related papers (2024-03-31T22:22:53Z) - Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware
Classification [7.696798306913988]
We introduce a framework outlining fairness regulations aligned with various fairness definitions.
We explore the configuration for in-context learning and the procedure for selecting in-context demonstrations using RAG.
Experiments conducted with different LLMs indicate that GPT-4 delivers superior results in terms of both accuracy and fairness compared to other models.
arXiv Detail & Related papers (2024-02-28T17:29:27Z) - Rethinking the Roles of Large Language Models in Chinese Grammatical
Error Correction [62.409807640887834]
Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences.
LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus.
We rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC.
arXiv Detail & Related papers (2024-02-18T01:40:34Z) - A Survey on Fairness in Large Language Models [28.05516809190299]
Large Language Models (LLMs) have shown powerful performance and development prospects.
LLMs can capture social biases from unprocessed training data and propagate the biases to downstream tasks.
Unfair LLM systems have undesirable social impacts and potential harms.
arXiv Detail & Related papers (2023-08-20T03:30:22Z) - A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry.
This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.