Related papers: A supervised generative optimization approach for tabular data

A supervised generative optimization approach for tabular data

URL: http://arxiv.org/abs/2309.05079v2
Date: Thu, 9 May 2024 18:29:06 GMT
Title: A supervised generative optimization approach for tabular data
Authors: Shinpei Nakamura-Sakai, Fadi Hamad, Saheed Obitayo, Vamsi K. Potluru,
Abstract summary: This work presents a novel synthetic data generation framework. It integrates a supervised component tailored to the specific downstream task and employs a meta-learning approach to learn the optimal mixture distribution of existing synthetic distributions.
Score: 2.5311562666866494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching the consensus on which method we should use for the specific data sets and use cases remains challenging. Moreover, the majority of existing approaches are ``unsupervised'' in the sense that they do not take into account the downstream task. To address these issues, this work presents a novel synthetic data generation framework. The framework integrates a supervised component tailored to the specific downstream task and employs a meta-learning approach to learn the optimal mixture distribution of existing synthetic distributions.

Related papers

Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z)
Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques [6.744437741221969]
As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution.<n>This review prioritizes the actionable goals that drive synthetic data creation, including conditional generation and risk-sensitive modeling.
arXiv Detail & Related papers (2025-07-15T14:57:23Z)
A Comprehensive Survey of Synthetic Tabular Data Generation [27.112327373017457]
Tabular data is one of the most prevalent and critical data formats across diverse real-world applications. It is often constrained by challenges such as data scarcity, privacy concerns, and class imbalance. Synthetic data generation has emerged as a promising solution, leveraging generative models to learn the distribution of real datasets.
arXiv Detail & Related papers (2025-04-23T08:33:34Z)
Opinion: Revisiting synthetic data classifications from a privacy perspective [42.12937192948916]
Synthetic data is emerging as a cost-effective solution to meet the increasing data demands of AI development. Traditional classification of synthetic data types does not reflect the ever-increasing methods to generate synthetic data. We make a case for an alternative approach to grouping synthetic data types that better reflect privacy perspectives.
arXiv Detail & Related papers (2025-03-05T13:54:13Z)
Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic. Our approach transforms numerical data into text, re-framing data generation as a language modeling task. Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z)
A Survey on Data Synthesis and Augmentation for Large Language Models [35.59526251210408]
This paper reviews and summarizes data generation techniques throughout the lifecycle of Large Language Models. We discuss the current constraints faced by these methods and investigate potential pathways for future development and research.
arXiv Detail & Related papers (2024-10-16T16:12:39Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
Statistical properties and privacy guarantees of an original distance-based fully synthetic data generation method [0.0]
This work shows the technical feasibility of generating publicly releasable synthetic data using a multi-step framework. By successfully assessing the quality of data produced using a novel multi-step synthetic data generation framework, we showed the technical and conceptual soundness of the Open-CESP initiative.
arXiv Detail & Related papers (2023-10-10T12:29:57Z)
Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task. We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z)
Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z)
Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge. Existing private generative models are struggling with the utility of synthetic samples. We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z)
Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z)
Foundations of Bayesian Learning from Synthetic Data [1.6249267147413522]
We use a Bayesian paradigm to characterise the updating of model parameters when learning on synthetic data. Recent results from general Bayesian updating support a novel and robust approach to synthetic-learning founded on decision theory.
arXiv Detail & Related papers (2020-11-16T21:49:17Z)
On synthetic data generation for anomaly detection in complex social networks [1.1602089225841632]
This paper studies the feasibility of synthetic data generation for mission-critical applications. In particular, the development of a generative model, capable of creating data for anomalous rare activities in complex social networks is sought.
arXiv Detail & Related papers (2020-10-25T03:53:19Z)
Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset. With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset. In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.