LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
- URL: http://arxiv.org/abs/2402.14208v3
- Date: Mon, 24 Jun 2024 04:49:16 GMT
- Title: LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
- Authors: Wenlong Deng, Blair Chen, Beidi Zhao, Chiyu Zhang, Xiaoxiao Li, Christos Thrampoulidis,
- Abstract summary: This paper proposes a novel method for learning fair text embeddings.
We define a novel content-conditional equal distance (CCED) fairness for text embeddings.
We also introduce a content-conditional debiasing (CCD) loss to ensure that embeddings of texts with different sensitive attributes but identical content maintain the same distance from the embedding of their corresponding neutral text.
- Score: 37.92120550031469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mitigating biases in machine learning models has become an increasing concern in Natural Language Processing (NLP), particularly in developing fair text embeddings, which are crucial yet challenging for real-world applications like search engines. In response, this paper proposes a novel method for learning fair text embeddings. First, we define a novel content-conditional equal distance (CCED) fairness for text embeddings, ensuring content-conditional independence between sensitive attributes and text embeddings. Building on CCED, we introduce a content-conditional debiasing (CCD) loss to ensure that embeddings of texts with different sensitive attributes but identical content maintain the same distance from the embedding of their corresponding neutral text. Additionally, we tackle the issue of insufficient training data by using Large Language Models (LLMs) with instructions to fairly augment texts into different sensitive groups. Our extensive evaluations show that our approach effectively enhances fairness while maintaining the utility of embeddings. Furthermore, our augmented dataset, combined with the CCED metric, serves as an new benchmark for evaluating fairness.
Related papers
- Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP [46.53595526049201]
A text encoder within Vision-Language Models (VLMs) like CLIP plays a crucial role in translating textual input into an embedding space shared with images.
We propose a framework of Semantic Token Reweighting to build Interpretable text embeddings (SToRI)
SToRI refines the text encoding process in CLIP by differentially weighting semantic elements based on contextual importance.
arXiv Detail & Related papers (2024-10-11T02:42:13Z) - Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.
Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.
Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z) - An efficient text augmentation approach for contextualized Mandarin speech recognition [4.600045052545344]
Our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models.
To contextualize a pre-trained CIF-based ASR, we construct a codebook using limited speech-text data.
Our experiments on diverse Mandarin test sets demonstrate that our TA approach significantly boosts recognition performance.
arXiv Detail & Related papers (2024-06-14T11:53:14Z) - Improving Text Embeddings with Large Language Models [59.930513259982725]
We introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages.
Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data.
arXiv Detail & Related papers (2023-12-31T02:13:18Z) - Constructing Vec-tionaries to Extract Message Features from Texts: A
Case Study of Moral Appeals [5.336592570916432]
We present an approach to construct vec-tionary measurement tools that boost validated dictionaries with word embeddings.
A vec-tionary can produce additional metrics to capture the ambivalence of a message feature beyond its strength in texts.
arXiv Detail & Related papers (2023-12-10T20:37:29Z) - Text Attribute Control via Closed-Loop Disentanglement [72.2786244367634]
We propose a novel approach to achieve a robust control of attributes while enhancing content preservation.
In this paper, we use a semi-supervised contrastive learning method to encourage the disentanglement of attributes in latent spaces.
We conducted experiments on three text datasets, including the Yelp Service review dataset, the Amazon Product review dataset, and the GoEmotions dataset.
arXiv Detail & Related papers (2023-12-01T01:26:38Z) - Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning [10.897468059705238]
Supervised paraphrasers rely heavily on large quantities of labelled data to help preserve meaning and intent.
In this paper we aim to assist practitioners in developing usable paraphrasers by exploring In-Context Learning (ICL) with large language models (LLMs)
Our study focuses on key factors such as - number and order of demonstrations, exclusion of prompt instruction, and reduction in measured toxicity.
arXiv Detail & Related papers (2023-10-16T16:18:55Z) - Conditional Supervised Contrastive Learning for Fair Text Classification [59.813422435604025]
We study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning.
Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives.
arXiv Detail & Related papers (2022-05-23T17:38:30Z) - Improving Disentangled Text Representation Learning with
Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging.
Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text.
Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.