Related papers: LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases

LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases

URL: http://arxiv.org/abs/2501.03112v1
Date: Mon, 06 Jan 2025 16:20:44 GMT
Title: LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases
Authors: Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Viren Bajaj, Zeya Ahmad,
Abstract summary: LangFair aims to equip LLM practitioners with the tools to evaluate bias and fairness risks relevant to their specific use cases.<n>The package offers functionality to easily generate evaluation datasets, comprised of LLM responses to use-case-specific prompts.<n>To guide in metric selection, LangFair offers an actionable decision framework.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have been observed to exhibit bias in numerous ways, potentially creating or worsening outcomes for specific groups identified by protected attributes such as sex, race, sexual orientation, or age. To help address this gap, we introduce LangFair, an open-source Python package that aims to equip LLM practitioners with the tools to evaluate bias and fairness risks relevant to their specific use cases. The package offers functionality to easily generate evaluation datasets, comprised of LLM responses to use-case-specific prompts, and subsequently calculate applicable metrics for the practitioner's use case. To guide in metric selection, LangFair offers an actionable decision framework.

Related papers

FairLangProc: A Python package for fairness in NLP [0.0]
This paper presents a Python package providing a common implementation of some of the more recent advances in fairness in Natural Language Processing.<n>FairLangProc aims to encourage the widespread use and democratization of bias mitigation techniques.
arXiv Detail & Related papers (2025-08-05T17:47:53Z)
Hey, That's My Data! Label-Only Dataset Inference in Large Language Models [63.35066172530291]
CatShift is a label-only dataset-inference framework.<n>It capitalizes on catastrophic forgetting: the tendency of an LLM to overwrite previously learned knowledge when exposed to new data.
arXiv Detail & Related papers (2025-06-06T13:02:59Z)
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes [20.20764453136706]
Large Language Models (LLMs) are often used as automated judges to evaluate text. We propose using linear classifying probes, trained by leveraging differences between contrasting pairs of prompts, to access latent knowledge and extract more accurate preferences.
arXiv Detail & Related papers (2025-03-22T12:35:25Z)
Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models [1.787433808079955]
Large language models (LLMs) have been observed to perpetuate unwanted biases in training data.<n>In this paper, we mitigate bias by leveraging small biased and anti-biased expert models to obtain a debiasing signal.<n> Experiments on mitigating gender, race, and religion biases show a reduction in bias on several local and global bias metrics.
arXiv Detail & Related papers (2024-12-02T16:56:08Z)
An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases [0.0]
This paper aims to provide a technical guide for practitioners to assess bias and fairness risks in large language models. The main contribution of this work is a decision framework that allows practitioners to determine which metrics to use for a specific LLM use case.
arXiv Detail & Related papers (2024-07-15T16:04:44Z)
LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation [38.64175351885443]
Large language models have been flourishing in the natural language processing (NLP) domain. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns. Existing works only fine-tune a sole LLM on given text data without introducing that important information to it.
arXiv Detail & Related papers (2024-06-27T01:37:57Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
LangBiTe: A Platform for Testing Bias in Large Language Models [1.9744907811058787]
Large Language Models (LLMs) are trained on a vast amount of data scrapped from forums, websites, social media and other internet sources. LangBiTe enables development teams to tailor their test scenarios, and automatically generate and execute the test cases according to a set of user-defined ethical requirements. LangBite provides users with the bias evaluation of LLMs, and end-to-end traceability between the initial ethical requirements and the insights obtained.
arXiv Detail & Related papers (2024-04-29T10:02:45Z)
Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks. We instruct an LLM to self-evaluate its answers. We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z)
Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs) This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias" We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z)
FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models [12.62204775625353]
Recent studies have demonstrated that large pretrained language models (LLMs) such as BERT and GPT-2 exhibit biases in token prediction. We present a comprehensive survey of such techniques tailored towards widely used LLMs such as BERT, GPT-2, etc. We additionally introduce Fairpy, a modular and toolkit that provides plug-and-play interfaces for integrating these mathematical tools.
arXiv Detail & Related papers (2023-02-10T20:54:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.