Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware
Classification
- URL: http://arxiv.org/abs/2402.18502v1
- Date: Wed, 28 Feb 2024 17:29:27 GMT
- Title: Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware
Classification
- Authors: Garima Chhikara, Anurag Sharma, Kripabandhu Ghosh, Abhijnan
Chakraborty
- Abstract summary: We introduce a framework outlining fairness regulations aligned with various fairness definitions.
We explore the configuration for in-context learning and the procedure for selecting in-context demonstrations using RAG.
Experiments conducted with different LLMs indicate that GPT-4 delivers superior results in terms of both accuracy and fairness compared to other models.
- Score: 7.696798306913988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Employing Large Language Models (LLM) in various downstream applications such
as classification is crucial, especially for smaller companies lacking the
expertise and resources required for fine-tuning a model. Fairness in LLMs
helps ensure inclusivity, equal representation based on factors such as race,
gender and promotes responsible AI deployment. As the use of LLMs has become
increasingly prevalent, it is essential to assess whether LLMs can generate
fair outcomes when subjected to considerations of fairness. In this study, we
introduce a framework outlining fairness regulations aligned with various
fairness definitions, with each definition being modulated by varying degrees
of abstraction. We explore the configuration for in-context learning and the
procedure for selecting in-context demonstrations using RAG, while
incorporating fairness rules into the process. Experiments conducted with
different LLMs indicate that GPT-4 delivers superior results in terms of both
accuracy and fairness compared to other models. This work is one of the early
attempts to achieve fairness in prediction tasks by utilizing LLMs through
in-context learning.
Related papers
- Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge [84.34545223897578]
Despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility.
We identify 12 key potential biases and propose a new automated bias quantification framework-CALM- which quantifies and analyzes each type of bias in LLM-as-a-Judge.
Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.
arXiv Detail & Related papers (2024-10-03T17:53:30Z) - Fairness in Large Language Models in Three Hours [2.443957114877221]
This tutorial provides a systematic overview of recent advances in the literature concerning large language models.
The concept of fairness in LLMs is then explored, summarizing the strategies for evaluating bias and the algorithms designed to promote fairness.
arXiv Detail & Related papers (2024-08-02T03:44:14Z) - Inducing Group Fairness in LLM-Based Decisions [12.368678951470162]
Group fairness in Prompting Large Language Models (LLMs) is a well-studied problem.
We show that prompt-based classifiers may lead to unfair decisions.
We introduce several remediation techniques and benchmark their fairness and performance trade-offs.
arXiv Detail & Related papers (2024-06-24T15:45:20Z) - Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers [27.66626125248612]
This paper presents an empirical study evaluating Large Language Models (LLMs) using the TREC Fair Ranking dataset.
We focus on the representation of binary protected attributes such as gender and geographic location, which are historically underrepresented in search outcomes.
Our analysis delves into how these LLMs handle queries and documents related to these attributes, aiming to uncover biases in their ranking algorithms.
arXiv Detail & Related papers (2024-04-04T04:23:19Z) - Fairness in Large Language Models: A Taxonomic Survey [2.669847575321326]
Large Language Models (LLMs) have demonstrated remarkable success across various domains.
Despite their promising performance in numerous real-world applications, most of these algorithms lack fairness considerations.
arXiv Detail & Related papers (2024-03-31T22:22:53Z) - Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs)
Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks.
We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z) - Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact.
We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Selecting Shots for Demographic Fairness in Few-Shot Learning with Large
Language Models [14.772568847965408]
We explore the effect of shots, which directly affect the performance of models, on the fairness of large language models (LLMs) as NLP classification systems.
We consider how different shot selection strategies, both existing and new demographically sensitive methods, affect model fairness across three standard fairness datasets.
arXiv Detail & Related papers (2023-11-14T19:02:03Z) - Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks.
How do we evaluate the capabilities of LLMs to consistently produce factually correct answers?
We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z) - Fair Few-shot Learning with Auxiliary Sets [53.30014767684218]
In many machine learning (ML) tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance.
In this paper, we define the fairness-aware learning task with limited training samples as the emphfair few-shot learning problem.
We devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks.
arXiv Detail & Related papers (2023-08-28T06:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.