OCT-GAN: Neural ODE-based Conditional Tabular GANs
- URL: http://arxiv.org/abs/2105.14969v1
- Date: Mon, 31 May 2021 13:58:55 GMT
- Title: OCT-GAN: Neural ODE-based Conditional Tabular GANs
- Authors: Jayoung Kim, Jinsung Jeon, Jaehoon Lee, Jihyeon Hyeong, Noseong Park
- Abstract summary: We introduce our generator and discriminator based on neural ordinary differential equations (NODEs)
We conduct experiments with 13 datasets, including but not limited to insurance fraud detection, online news article prediction, and so on.
- Score: 8.062118111791495
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Synthesizing tabular data is attracting much attention these days for various
purposes. With sophisticate synthetic data, for instance, one can augment its
training data. For the past couple of years, tabular data synthesis techniques
have been greatly improved. Recent work made progress to address many problems
in synthesizing tabular data, such as the imbalanced distribution and
multimodality problems. However, the data utility of state-of-the-art methods
is not satisfactory yet. In this work, we significantly improve the utility by
designing our generator and discriminator based on neural ordinary differential
equations (NODEs). After showing that NODEs have theoretically preferred
characteristics for generating tabular data, we introduce our designs. The
NODE-based discriminator performs a hidden vector evolution trajectory-based
classification rather than classifying with a hidden vector at the last layer
only. Our generator also adopts an ODE layer at the very beginning of its
architecture to transform its initial input vector (i.e., the concatenation of
a noisy vector and a condition vector in our case) onto another latent vector
space suitable for the generation process. We conduct experiments with 13
datasets, including but not limited to insurance fraud detection, online news
article prediction, and so on, and our presented method outperforms other
state-of-the-art tabular data synthesis methods in many cases of our
classification, regression, and clustering experiments.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Multi-objective evolutionary GAN for tabular data synthesis [0.873811641236639]
Synthetic data has a key role to play in data sharing by statistical agencies and other generators of statistical data products.
This paper proposes a smart MO evolutionary conditional GAN (SMOE-CTGAN) for synthetic data.
Our results indicate that SMOE-CTGAN is able to discover synthetic datasets with different risk and utility levels for multiple national census datasets.
arXiv Detail & Related papers (2024-04-15T23:07:57Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Convex space learning improves deep-generative oversampling for tabular
imbalanced classification on smaller datasets [0.0]
We show that existing deep generative models perform poorly compared to linear approaches generating synthetic samples from the convex space of the minority class.
We propose a deep generative model, ConvGeN combining the idea of convex space learning and deep generative models.
arXiv Detail & Related papers (2022-06-20T14:42:06Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - DATGAN: Integrating expert knowledge into deep learning for synthetic
tabular data [0.0]
Synthetic data can be used in various applications, such as correcting bias datasets or replacing scarce original data for simulation purposes.
Deep learning models are data-driven and it is difficult to control the generation process.
This article presents the Directed Acyclic Tabular GAN ( DATGAN) to address these limitations.
arXiv Detail & Related papers (2022-03-07T16:09:03Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z) - Tensor feature hallucination for few-shot learning [17.381648488344222]
Few-shot classification addresses the challenge of classifying examples given limited supervision and limited data.
Previous works on synthetic data generation for few-shot classification focus on exploiting complex models.
We investigate how a simple and straightforward synthetic data generation method can be used effectively.
arXiv Detail & Related papers (2021-06-09T18:25:08Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Knowledge transfer across cell lines using Hybrid Gaussian Process
models with entity embedding vectors [62.997667081978825]
A large number of experiments are performed to develop a biochemical process.
Could we exploit data of already developed processes to make predictions for a novel process, we could significantly reduce the number of experiments needed.
arXiv Detail & Related papers (2020-11-27T17:38:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.