FairLangProc: A Python package for fairness in NLP
- URL: http://arxiv.org/abs/2508.03677v1
- Date: Tue, 05 Aug 2025 17:47:53 GMT
- Title: FairLangProc: A Python package for fairness in NLP
- Authors: Arturo Pérez-Peralta, Sandra Benítez-Peña, Rosa E. Lillo,
- Abstract summary: This paper presents a Python package providing a common implementation of some of the more recent advances in fairness in Natural Language Processing.<n>FairLangProc aims to encourage the widespread use and democratization of bias mitigation techniques.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise in usage of Large Language Models to near ubiquitousness in recent years has risen societal concern about their applications in decision-making contexts, such as organizational justice or healthcare. This, in turn, poses questions about the fairness of these models in critical settings, which leads to the developement of different procedures to address bias in Natural Language Processing. Although many datasets, metrics and algorithms have been proposed to measure and mitigate harmful prejudice in Natural Language Processing, their implementation is diverse and far from centralized. As a response, this paper presents FairLangProc, a comprehensive Python package providing a common implementation of some of the more recent advances in fairness in Natural Language Processing providing an interface compatible with the famous Hugging Face transformers library, aiming to encourage the widespread use and democratization of bias mitigation techniques. The implementation can be found on https://github.com/arturo-perez-peralta/FairLangProc.
Related papers
- Langformers: Unified NLP Pipelines for Language Models [3.690904966341072]
Langformers is an open-source Python library designed to streamline NLP pipelines.<n>It integrates conversational AI, pretraining, text classification, sentence embedding/reranking, data labelling, semantic search, and knowledge distillation into a cohesive API.
arXiv Detail & Related papers (2025-04-12T10:17:49Z) - LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases [0.0]
LangFair aims to equip LLM practitioners with the tools to evaluate bias and fairness risks relevant to their specific use cases.<n>The package offers functionality to easily generate evaluation datasets, comprised of LLM responses to use-case-specific prompts.<n>To guide in metric selection, LangFair offers an actionable decision framework.
arXiv Detail & Related papers (2025-01-06T16:20:44Z) - The GUS Framework: Benchmarking Social Bias Classification with Discriminative (Encoder-Only) and Generative (Decoder-Only) Language Models [3.7716682697752506]
Generalizations, Unfairness, and Stereotypes (the GUS framework) are considered as key linguistic components underlying social bias.<n>The GUS framework employs a semi-automated approach to create a comprehensive synthetic dataset, which is verified by humans to maintain ethical standards.<n>Our methodology, which combines discriminative (encoder-only) models and generative (auto-regressive large language models) identifies biased entities in text.
arXiv Detail & Related papers (2024-10-10T21:51:22Z) - Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.<n>We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Forcing Diffuse Distributions out of Language Models [70.28345569190388]
Despite being trained specifically to follow user instructions, today's instructiontuned language models perform poorly when instructed to produce random outputs.
We propose a fine-tuning method that encourages language models to output distributions that are diffuse over valid outcomes.
arXiv Detail & Related papers (2024-04-16T19:17:23Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Native Language Identification with Big Bird Embeddings [0.3069335774032178]
Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language.
The current work investigates if input size is a limiting factor, and shows that classifiers trained using Big Bird embeddings outperform linguistic feature engineering models by a large margin on the Reddit-L2 dataset.
arXiv Detail & Related papers (2023-09-13T12:47:40Z) - A Trip Towards Fairness: Bias and De-Biasing in Large Language Models [1.987426401990999]
Cheap-to-Build Very Large-Language Models (CtB-LLMs) with affordable training are emerging as the next big revolution in natural language processing and understanding.
In this paper, we performed a large investigation of the bias of three families of CtB-LLMs.
We show that debiasing techniques are effective and usable.
arXiv Detail & Related papers (2023-05-23T09:35:37Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.