Chameleon: Foundation Models for Fairness-aware Multi-modal Data
Augmentation to Enhance Coverage of Minorities
- URL: http://arxiv.org/abs/2402.01071v1
- Date: Fri, 2 Feb 2024 00:16:45 GMT
- Title: Chameleon: Foundation Models for Fairness-aware Multi-modal Data
Augmentation to Enhance Coverage of Minorities
- Authors: Mahdi Erfanian and H. V. Jagadish and Abolfazl Asudeh
- Abstract summary: Underrepresentation of minorities in training data is a well-recognized concern.
We propose Chameleon, a system that augments a data set with a minimal addition of settings to enhance the coverage of under-represented groups.
Our experiment, in addition to confirming the efficiency of our proposed algorithms, illustrate the effectiveness of our approach.
- Score: 25.215178019059874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The potential harms of the under-representation of minorities in training
data, particularly in multi-modal settings, is a well-recognized concern. While
there has been extensive effort in detecting such under-representation,
resolution has remained a challenge. With recent advancements in generative AI,
large language models and foundation models have emerged as versatile tools
across various domains. In this paper, we propose Chameleon, a system that
efficiently utilizes these tools to augment a data set with a minimal addition
of synthetically generated tuples, in order to enhance the coverage of the
under-represented groups. Our system follows a rejection sampling approach to
ensure the generated tuples have a high quality and follow the underlying
distribution. In order to minimize the rejection chance of the generated
tuples, we propose multiple strategies for providing a guide for the foundation
model. Our experiment results, in addition to confirming the efficiency of our
proposed algorithms, illustrate the effectiveness of our approach, as the
unfairness of the model in a downstream task significantly dropped after data
repair using Chameleon.
Related papers
- Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation and Gaussian-Decayed Contrastive Learning [37.54523122932728]
We propose a pipeline-based data augmentation method via large language models (LLMs)
To tackle the issue of low data diversity, our pipeline utilizes knowledge graphs (KGs) to extract entities and quantities.
To address high data noise, the GCSE model uses a Gaussian-decayed function to limit the impact of false hard negative samples.
arXiv Detail & Related papers (2024-09-19T16:29:58Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Inclusive GAN: Improving Data and Minority Coverage in Generative Models [101.67587566218928]
We formalize the problem of minority inclusion as one of data coverage.
We then propose to improve data coverage by harmonizing adversarial training with reconstructive generation.
We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include.
arXiv Detail & Related papers (2020-04-07T13:31:33Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.