Fairness for Text Classification Tasks with Identity Information Data
Augmentation Methods
- URL: http://arxiv.org/abs/2203.03541v1
- Date: Fri, 4 Feb 2022 07:08:30 GMT
- Title: Fairness for Text Classification Tasks with Identity Information Data
Augmentation Methods
- Authors: Mohit Wadhwa, Mohan Bhambhani, Ashvini Jindal, Uma Sawant, Ramanujam
Madhavan
- Abstract summary: Methods are entirely based on generating counterfactuals for the given training and test set instances.
We empirically show that the two-stage augmentation process leads to diverse identity pairs and an enhanced training set.
- Score: 2.5199066832791535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual fairness methods address the question: How would the
prediction change if the sensitive identity attributes referenced in the text
instance were different? These methods are entirely based on generating
counterfactuals for the given training and test set instances. Counterfactual
instances are commonly prepared by replacing sensitive identity terms, i.e.,
the identity terms present in the instance are replaced with other identity
terms that fall under the same sensitive category. Therefore, the efficacy of
these methods depends heavily on the quality and comprehensiveness of identity
pairs. In this paper, we offer a two-step data augmentation process where (1)
the former stage consists of a novel method for preparing a comprehensive list
of identity pairs with word embeddings, and (2) the latter consists of
leveraging prepared identity pairs list to enhance the training instances by
applying three simple operations (namely identity pair replacement, identity
term blindness, and identity pair swap). We empirically show that the two-stage
augmentation process leads to diverse identity pairs and an enhanced training
set, with an improved counterfactual token-based fairness metric score on two
well-known text classification tasks.
Related papers
- Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier [20.95472997379712]
Text-to-Image (T2I) personalization methods aim to adapt T2I models to unseen concepts by learning new tokens.
We show that new concept tokens possess both generation and classification capabilities by regarding each category as a single concept.
We propose Multi-Class textual inversion, which includes a discriminative regularization term for the token updating process.
arXiv Detail & Related papers (2024-10-29T17:55:02Z) - Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$)
GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - Language Models As Semantic Indexers [78.83425357657026]
We introduce LMIndexer, a self-supervised framework to learn semantic IDs with a generative language model.
We show the high quality of the learned IDs and demonstrate their effectiveness on three tasks including recommendation, product search, and document retrieval.
arXiv Detail & Related papers (2023-10-11T18:56:15Z) - Multiview Identifiers Enhanced Generative Retrieval [78.38443356800848]
generative retrieval generates identifier strings of passages as the retrieval target.
We propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage.
Our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.
arXiv Detail & Related papers (2023-05-26T06:50:21Z) - Improving Self-training for Cross-lingual Named Entity Recognition with
Contrastive and Prototype Learning [80.08139343603956]
In cross-lingual named entity recognition, self-training is commonly used to bridge the linguistic gap.
In this work, we aim to improve self-training for cross-lingual NER by combining representation learning and pseudo label refinement.
Our proposed method, namely ContProto mainly comprises two components: (1) contrastive self-training and (2) prototype-based pseudo-labeling.
arXiv Detail & Related papers (2023-05-23T02:52:16Z) - X-ReID: Cross-Instance Transformer for Identity-Level Person
Re-Identification [53.047542904329866]
Cross Intra-Identity Instances module (IntraX) fuses different intra-identity instances to transfer Identity-Level knowledge.
Cross Inter-Identity Instances module (InterX) involves hard positive and hard negative instances to improve the attention response to the same identity.
arXiv Detail & Related papers (2023-02-04T03:16:18Z) - Identity Documents Authentication based on Forgery Detection of
Guilloche Pattern [2.606834301724095]
An authentication model for identity documents based on forgery detection of guilloche patterns is proposed.
Experiments are conducted in order to analyze and identify the most proper parameters to achieve higher authentication performance.
arXiv Detail & Related papers (2022-06-22T11:37:10Z) - Single versus Multiple Annotation for Named Entity Recognition of
Mutations [4.213427823201119]
We address the impact of using a single annotator vs two annotators, in order to measure whether multiple annotators are required.
Once we evaluate the performance loss when using a single annotator, we apply different methods to sample the training data for second annotation.
We use held-out double-annotated data to build two scenarios with different types of rankings: similarity-based and confidence based.
We evaluate both approaches on: (i) their ability to identify training instances that are erroneous, and (ii) on Mutation NER performance for state-of-the-art
arXiv Detail & Related papers (2021-01-19T03:54:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.