FairFinGAN: Fairness-aware Synthetic Financial Data Generation
- URL: http://arxiv.org/abs/2603.05327v1
- Date: Thu, 05 Mar 2026 16:09:19 GMT
- Title: FairFinGAN: Fairness-aware Synthetic Financial Data Generation
- Authors: Tai Le Quy, Dung Nguyen Tuan, Trung Nguyen Thanh, Duy Tran Cong, Huyen Giang Thi Thu, Frank Hopfgartner,
- Abstract summary: We propose FairFinGAN, a WGAN-based framework designed to generate synthetic financial data while mitigating bias with respect to the protected attribute.<n>We evaluate our proposed model on five real-world financial datasets and compare it with existing GAN-based data generation methods.<n> Experimental results show that our approach achieves superior fairness metrics without significant loss in data utility.
- Score: 0.3544442162078764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Financial datasets often suffer from bias that can lead to unfair decision-making in automated systems. In this work, we propose FairFinGAN, a WGAN-based framework designed to generate synthetic financial data while mitigating bias with respect to the protected attribute. Our approach incorporates fairness constraints directly into the training process through a classifier, ensuring that the synthetic data is both fair and preserves utility for downstream predictive tasks. We evaluate our proposed model on five real-world financial datasets and compare it with existing GAN-based data generation methods. Experimental results show that our approach achieves superior fairness metrics without significant loss in data utility, demonstrating its potential as a tool for bias-aware data generation in financial applications.
Related papers
- Synthetic Financial Data Generation for Enhanced Financial Modelling [0.0]
This paper presents a unified multi-criteria evaluation framework for synthetic financial data.<n>Using historical S and P 500 daily data, we evaluate fidelity (Maximum Mean Discrepancy, MMD), temporal structure (autocorrelation and volatility clustering), and practical utility in downstream tasks.<n>We articulate practical guidelines for selecting generative models according to application needs and computational constraints.
arXiv Detail & Related papers (2025-12-25T21:43:16Z) - Reliable and Reproducible Demographic Inference for Fairness in Face Analysis [63.46525489354455]
We propose a fully reproducible DAI pipeline that replaces conventional end-to-end training with a modular transfer learning approach.<n>We audit this pipeline across three dimensions: accuracy, fairness, and a newly introduced notion of robustness, defined via intra-identity consistency.<n>Our results show that the proposed method outperforms strong baselines, particularly on ethnicity, which is the more challenging attribute.
arXiv Detail & Related papers (2025-10-23T12:22:02Z) - TABFAIRGDT: A Fast Fair Tabular Data Generator using Autoregressive Decision Trees [11.0044761900691]
We introduce TABFAIRGDT, a novel method for generating fair synthetic data using autoregressive decision trees.<n>We evaluate TABFAIRGDT on benchmark fairness datasets and demonstrate that it outperforms state-of-the-art (SOTA) deep generative models.<n>Remarkably, TABFAIRGDT achieves a 72% average speedup over the fastest SOTA baseline across various dataset sizes.
arXiv Detail & Related papers (2025-09-24T09:35:52Z) - FairTabGen: Unifying Counterfactual and Causal Fairness in Synthetic Tabular Data Generation [4.044506553590468]
We present FairTabGen, a fairness-aware large language model-based framework for synthetic data generation.<n>We use in-context learning, prompt refinement, and fairness-aware data curation to balance fairness and utility.
arXiv Detail & Related papers (2025-08-15T21:36:07Z) - Valid Inference with Imperfect Synthetic Data [39.10587411316875]
We introduce a new estimator based on generalized method of moments.<n>We find that interactions between the moment residuals of synthetic data and those of real data can greatly improve estimates of the target parameter.
arXiv Detail & Related papers (2025-08-08T18:32:52Z) - FairCauseSyn: Towards Causally Fair LLM-Augmented Synthetic Data Generation [4.392938909804638]
Synthetic data generation creates data based on real-world data using generative models.<n>We develop the first LLM-augmented synthetic data generation method to enhance causal fairness using real-world health data.<n>When trained on causally fair predictors, synthetic data reduces bias on the sensitive attribute by 70% compared to real data.
arXiv Detail & Related papers (2025-06-23T19:59:26Z) - Targeted Learning for Data Fairness [52.59573714151884]
We expand fairness inference by evaluating fairness in the data generating process itself.<n>We derive estimators demographic parity, equal opportunity, and conditional mutual information.<n>To validate our approach, we perform several simulations and apply our estimators to real data.
arXiv Detail & Related papers (2025-02-06T18:51:28Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - FinDiff: Diffusion Models for Financial Tabular Data Generation [5.824064631226058]
FinDiff is a diffusion model designed to generate real-world financial data for a variety of regulatory downstream tasks.
It is evaluated against state-of-the-art baseline models using three real-world financial datasets.
arXiv Detail & Related papers (2023-09-04T09:30:15Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative
Networks [71.6879432974126]
We introduce DECAF: a GAN-based fair synthetic data generator for tabular data.
We show that DECAF successfully removes undesired bias and is capable of generating high-quality synthetic data.
We provide theoretical guarantees on the generator's convergence and the fairness of downstream models.
arXiv Detail & Related papers (2021-10-25T12:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.