FCT-GAN: Enhancing Table Synthesis via Fourier Transform
- URL: http://arxiv.org/abs/2210.06239v1
- Date: Wed, 12 Oct 2022 14:25:29 GMT
- Title: FCT-GAN: Enhancing Table Synthesis via Fourier Transform
- Authors: Zilong Zhao, Robert Birke, Lydia Y. Chen
- Abstract summary: Synthetic data emerges as an alternative sharing knowledge while adhering to restrictive regulations, e.g. General Data Protection Regulation.
We introduce feature tokenization and Fourier networks to construct a transformer-style generator and discriminator, and capture both local and global dependencies across columns.
- Score: 13.277332691308395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic tabular data emerges as an alternative for sharing knowledge while
adhering to restrictive data access regulations, e.g., European General Data
Protection Regulation (GDPR). Mainstream state-of-the-art tabular data
synthesizers draw methodologies from Generative Adversarial Networks (GANs),
which are composed of a generator and a discriminator. While convolution neural
networks are shown to be a better architecture than fully connected networks
for tabular data synthesizing, two key properties of tabular data are
overlooked: (i) the global correlation across columns, and (ii) invariant
synthesizing to column permutations of input data. To address the above
problems, we propose a Fourier conditional tabular generative adversarial
network (FCT-GAN). We introduce feature tokenization and Fourier networks to
construct a transformer-style generator and discriminator, and capture both
local and global dependencies across columns. The tokenizer captures local
spatial features and transforms original data into tokens. Fourier networks
transform tokens to frequency domains and element-wisely multiply a learnable
filter. Extensive evaluation on benchmarks and real-world data shows that
FCT-GAN can synthesize tabular data with high machine learning utility (up to
27.8% better than state-of-the-art baselines) and high statistical similarity
to the original data (up to 26.5% better), while maintaining the global
correlation across columns, especially on high dimensional dataset.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model.
Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.
TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure.
This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - Permutation-Invariant Tabular Data Synthesis [14.55825097637513]
We show that changing the input column order worsens the statistical difference between real and synthetic data by up to 38.67%.
We propose AE-GAN, a synthesizer that uses an autoencoder network to represent the tabular data and GAN networks to synthesize the latent representation.
We evaluate the proposed solutions on five datasets in terms of the sensitivity to the column permutation, the quality of synthetic data, and the utility in downstream analyses.
arXiv Detail & Related papers (2022-11-17T01:14:19Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data [8.014848609114154]
We propose Fed-TGAN, the first Federated learning framework for Tabular GANs.
To effectively learn a complex GAN on non-identical participants, Fed-TGAN designs two novel features.
Results show that Fed-TGAN accelerates training time per epoch up to 200%.
arXiv Detail & Related papers (2021-08-18T01:47:36Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Message Passing in Graph Convolution Networks via Adaptive Filter Banks [81.12823274576274]
We present a novel graph convolution operator, termed BankGCN.
It decomposes multi-channel signals on graphs into subspaces and handles particular information in each subspace with an adapted filter.
It achieves excellent performance in graph classification on a collection of benchmark graph datasets.
arXiv Detail & Related papers (2021-06-18T04:23:34Z) - CTAB-GAN: Effective Table Data Synthesizing [7.336728307626645]
We develop CTAB-GAN, a conditional table GAN architecture that can model diverse data types.
We show that CTAB-GAN remarkably resembles the real data for all three types of variables and results into higher accuracy for five machine learning algorithms, by up 17%.
arXiv Detail & Related papers (2021-02-16T18:53:57Z) - Tabular Transformers for Modeling Multivariate Time Series [30.717890753132824]
Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential.
Here we propose neural network models that represent tabular time series that can leverage their hierarchical structure.
We demonstrate our models on two datasets: a synthetic credit card transaction dataset, where the learned representations are used for fraud detection and synthetic data generation, and on a real pollution dataset, where the learned encodings are used to predict atmospheric pollutant concentrations.
arXiv Detail & Related papers (2020-11-03T16:58:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.