Counterfactual Multi-Token Fairness in Text Classification
- URL: http://arxiv.org/abs/2202.03792v2
- Date: Wed, 9 Feb 2022 04:29:13 GMT
- Title: Counterfactual Multi-Token Fairness in Text Classification
- Authors: Pranay Lohia
- Abstract summary: The concept of Counterfactual Generation has been extended to multi-token support valid over all forms of texts and documents.
We define the method of generating counterfactuals by perturbing multiple sensitive tokens as Counterfactual Multi-token Generation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The counterfactual token generation has been limited to perturbing only a
single token in texts that are generally short and single sentences. These
tokens are often associated with one of many sensitive attributes. With limited
counterfactuals generated, the goal to achieve invariant nature for machine
learning classification models towards any sensitive attribute gets bounded,
and the formulation of Counterfactual Fairness gets narrowed. In this paper, we
overcome these limitations by solving root problems and opening bigger domains
for understanding. We have curated a resource of sensitive tokens and their
corresponding perturbation tokens, even extending the support beyond
traditionally used sensitive attributes like Age, Gender, Race to Nationality,
Disability, and Religion. The concept of Counterfactual Generation has been
extended to multi-token support valid over all forms of texts and documents. We
define the method of generating counterfactuals by perturbing multiple
sensitive tokens as Counterfactual Multi-token Generation. The method has been
conceptualized to showcase significant performance improvement over
single-token methods and validated over multiple benchmark datasets. The
emendation in counterfactual generation propagates in achieving improved
Counterfactual Multi-token Fairness.
Related papers
- ElasticTok: Adaptive Tokenization for Image and Video [109.75935878130582]
We introduce ElasticTok, a method that conditions on prior frames to adaptively encode a frame into a variable number of tokens.
During inference, ElasticTok can dynamically allocate tokens when needed.
Our evaluations on images and video demonstrate the effectiveness of our approach in efficient token usage.
arXiv Detail & Related papers (2024-10-10T20:54:15Z) - STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM [59.08493154172207]
We propose a unified framework to streamline the semantic tokenization and generative recommendation process.
We formulate semantic tokenization as a text-to-token task and generative recommendation as a token-to-token task, supplemented by a token-to-text reconstruction task and a text-to-token auxiliary task.
All these tasks are framed in a generative manner and trained using a single large language model (LLM) backbone.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - SEP: Self-Enhanced Prompt Tuning for Visual-Language Model [93.94454894142413]
We introduce a novel approach named Self-Enhanced Prompt Tuning (SEP)
SEP explicitly incorporates discriminative prior knowledge to enhance both textual-level and visual-level embeddings.
Comprehensive evaluations across various benchmarks and tasks confirm SEP's efficacy in prompt tuning.
arXiv Detail & Related papers (2024-05-24T13:35:56Z) - Token Alignment via Character Matching for Subword Completion [34.76794239097628]
This paper examines a technique to alleviate the tokenization artifact on text completion in generative models.
The method, termed token alignment, involves backtracking to the last complete tokens and ensuring the model's generation aligns with the prompt.
arXiv Detail & Related papers (2024-03-13T16:44:39Z) - Token Fusion: Bridging the Gap between Token Pruning and Token Merging [71.84591084401458]
Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs.
computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging.
We introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging.
arXiv Detail & Related papers (2023-12-02T04:29:19Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - Beyond Attentive Tokens: Incorporating Token Importance and Diversity
for Efficient Vision Transformers [32.972945618608726]
Vision transformers have achieved significant improvements on various vision tasks but their quadratic interactions between tokens significantly reduce computational efficiency.
We propose an efficient token decoupling and merging method that can jointly consider the token importance and diversity for token pruning.
Our method can even improve the accuracy of DeiT-T by 0.1% after reducing its FLOPs by 40%.
arXiv Detail & Related papers (2022-11-21T09:57:11Z) - Practical Approaches for Fair Learning with Multitype and Multivariate
Sensitive Attributes [70.6326967720747]
It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences.
We introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces.
We empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.
arXiv Detail & Related papers (2022-11-11T11:28:46Z) - Flexible text generation for counterfactual fairness probing [8.262741696221143]
A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals.
Existing counterfactual generation methods rely on wordlists or templates, producing simple counterfactuals that don't take into account grammar, context, or subtle sensitive attribute references.
In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to make progress on this task.
arXiv Detail & Related papers (2022-06-28T05:07:20Z) - Token Manipulation Generative Adversarial Network for Text Generation [0.0]
We decompose conditional text generation problem into two tasks, make-a-blank and fill-in-the-blank, and extend the former to handle more complex manipulations on the given tokens.
We show that the proposed model not only addresses the limitations but also provides good results without compromising the performance in terms of quality and diversity.
arXiv Detail & Related papers (2020-05-06T13:10:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.