Generating synthetic transactional profiles
- URL: http://arxiv.org/abs/2111.01531v1
- Date: Thu, 28 Oct 2021 18:52:04 GMT
- Title: Generating synthetic transactional profiles
- Authors: Hadrien Lautraite, Patrick Mesana
- Abstract summary: In this paper, we generate synthetic transactional profiles using machine learning techniques.
We measured data utility by calculating common insights used by the banking industry on both the original and the synthetic data-set.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Financial institutions use clients' payment transactions in numerous banking
applications. Transactions are very personal and rich in behavioural patterns,
often unique to individuals, which make them equivalent to personally
identifiable information in some cases. In this paper, we generate synthetic
transactional profiles using machine learning techniques with the goal to
preserve both data utility and privacy. A challenge we faced was to deal with
sparse vectors due to the few spending categories a client uses compared to all
the ones available. We measured data utility by calculating common insights
used by the banking industry on both the original and the synthetic data-set.
Our approach shows that neural network models can generate valuable synthetic
data in such context. Finally, we tried privacy-preserving techniques and
observed its effect on models' performances.
Related papers
- Assessment of Differentially Private Synthetic Data for Utility and
Fairness in End-to-End Machine Learning Pipelines for Tabular Data [3.555830838738963]
Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers.
We identify the most effective synthetic data generation techniques for training and evaluating machine learning models.
arXiv Detail & Related papers (2023-10-30T03:37:16Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Privacy-Preserving Financial Anomaly Detection via Federated Learning & Multi-Party Computation [17.314619091307343]
We describe a privacy-preserving framework that allows financial institutions to jointly train highly accurate anomaly detection models.
We show that our solution enables the network to train a highly accurate anomaly detection model while preserving privacy of customer data.
arXiv Detail & Related papers (2023-10-06T19:16:41Z) - Synthetic Demographic Data Generation for Card Fraud Detection Using
GANs [4.651915393462367]
We build a deep-learning Generative Adversarial Network (GAN), called DGGAN, which will be used for demographic data generation.
Our model generates samples during model training, which we found important to overcame class imbalance issues.
arXiv Detail & Related papers (2023-06-29T17:08:57Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Generating Realistic Synthetic Relational Data through Graph Variational
Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases.
The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Bias Mitigated Learning from Differentially Private Synthetic Data: A
Cautionary Tale [13.881022208028751]
Bias can affect all analyses as the synthetic data distribution is an inconsistent estimate of the real-data distribution.
We propose several bias mitigation strategies using privatized likelihood ratios.
We show that bias mitigation provides simple and effective privacy-compliant augmentation for general applications of synthetic data.
arXiv Detail & Related papers (2021-08-24T19:56:44Z) - Differentially Private Synthetic Data: Applied Evaluations and
Enhancements [4.749807065324706]
Differentially private data synthesis protects personal details from exposure.
We evaluate four differentially private generative adversarial networks for data synthesis.
We propose QUAIL, an ensemble-based modeling approach to generating synthetic data.
arXiv Detail & Related papers (2020-11-11T04:03:08Z) - Super-App Behavioral Patterns in Credit Risk Models: Financial,
Statistical and Regulatory Implications [110.54266632357673]
We present the impact of alternative data that originates from an app-based marketplace, in contrast to traditional bureau data, upon credit scoring models.
Our results, validated across two countries, show that these new sources of data are particularly useful for predicting financial behavior in low-wealth and young individuals.
arXiv Detail & Related papers (2020-05-09T01:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.