Improving Fairness and Mitigating MADness in Generative Models
        - URL: http://arxiv.org/abs/2405.13977v3
- Date: Thu, 03 Oct 2024 21:46:36 GMT
- Title: Improving Fairness and Mitigating MADness in Generative Models
- Authors: Paul Mayer, Lorenzo Luzi, Ali Siahkoohi, Don H. Johnson, Richard G. Baraniuk, 
- Abstract summary: We show that training generative models with intentionally designed hypernetworks leads to models that are more fair when generating datapoints belonging to minority classes.
We introduce a regularization term that penalizes discrepancies between a generative model's estimated weights when trained on real data versus its own synthetic data.
- Score: 21.024727486615646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Generative models unfairly penalize data belonging to minority classes, suffer from model autophagy disorder (MADness), and learn biased estimates of the underlying distribution parameters. Our theoretical and empirical results show that training generative models with intentionally designed hypernetworks leads to models that 1) are more fair when generating datapoints belonging to minority classes 2) are more stable in a self-consumed (i.e., MAD) setting, and 3) learn parameters that are less statistically biased. To further mitigate unfairness, MADness, and bias, we introduce a regularization term that penalizes discrepancies between a generative model's estimated weights when trained on real data versus its own synthetic data. To facilitate training existing deep generative models within our framework, we offer a scalable implementation of hypernetworks that automatically generates a hypernetwork architecture for any given generative model. 
 
      
        Related papers
        - MeDi: Metadata-Guided Diffusion Models for Mitigating Biases in Tumor   Classification [13.350688594462214]
 We propose a novel approach explicitly modeling such metadata into a generative Diffusion model framework (MeDi)<n>MeDi allows for a targeted augmentation of underrepresented subpopulations with synthetic data.<n>We experimentally show that MeDi generates high-quality histopathology images for unseen subpopulations in TCGA.
 arXiv  Detail & Related papers  (2025-06-20T16:41:25Z)
- Synthetic Tabular Data Generation for Imbalanced Classification: The   Surprising Effectiveness of an Overlap Class [20.606333546028516]
 We show that state-of-the-art deep generative models yield significantly lower-quality minority examples than majority examples.
We propose a novel technique of converting the binary class labels to ternary class labels by introducing a class for the region where minority and majority distributions overlap.
 arXiv  Detail & Related papers  (2024-12-20T08:15:20Z)
- Constrained Diffusion Models via Dual Training [80.03953599062365]
 Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
 arXiv  Detail & Related papers  (2024-08-27T14:25:42Z)
- Fairness Feedback Loops: Training on Synthetic Data Amplifies Bias [47.79659355705916]
 Model-induced distribution shifts (MIDS) occur as previous model outputs pollute new model training sets over generations of models.
We introduce a framework that allows us to track multiple MIDS over many generations, finding that they can lead to loss in performance, fairness, and minoritized group representation.
Despite these negative consequences, we identify how models might be used for positive, intentional, interventions in their data ecosystems.
 arXiv  Detail & Related papers  (2024-03-12T17:48:08Z)
- Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
 This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
 arXiv  Detail & Related papers  (2024-02-19T02:08:09Z)
- On the Stability of Iterative Retraining of Generative Models on their   own Data [56.153542044045224]
 We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
 arXiv  Detail & Related papers  (2023-09-30T16:41:04Z)
- Fair GANs through model rebalancing for extremely imbalanced class
  distributions [5.463417677777276]
 We present an approach to construct an unbiased generative adversarial network (GAN) from an existing biased GAN.
We show results for the StyleGAN2 models while training on the Flickr Faces High Quality (FFHQ) dataset for racial fairness.
We further validate our approach by applying it to an imbalanced CIFAR10 dataset which is also twice as large.
 arXiv  Detail & Related papers  (2023-08-16T19:20:06Z)
- Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
 We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
 arXiv  Detail & Related papers  (2023-03-30T17:30:42Z)
- Fairness Reprogramming [42.65700878967251]
 We propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique.
Specifically, FairReprogram considers the case where models can not be changed and appends to the input a set of perturbations, called the fairness trigger.
We show both theoretically and empirically that the fairness trigger can effectively obscure demographic biases in the output prediction of fixed ML models.
 arXiv  Detail & Related papers  (2022-09-21T09:37:00Z)
- Self-Damaging Contrastive Learning [92.34124578823977]
 Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
 arXiv  Detail & Related papers  (2021-06-06T00:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.