Related papers: FCT-GAN: Enhancing Table Synthesis via Fourier Transform

FCT-GAN: Enhancing Table Synthesis via Fourier Transform

URL: http://arxiv.org/abs/2210.06239v1
Date: Wed, 12 Oct 2022 14:25:29 GMT
Title: FCT-GAN: Enhancing Table Synthesis via Fourier Transform
Authors: Zilong Zhao, Robert Birke, Lydia Y. Chen
Abstract summary: Synthetic data emerges as an alternative sharing knowledge while adhering to restrictive regulations, e.g. General Data Protection Regulation. We introduce feature tokenization and Fourier networks to construct a transformer-style generator and discriminator, and capture both local and global dependencies across columns.
Score: 13.277332691308395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Synthetic tabular data emerges as an alternative for sharing knowledge while adhering to restrictive data access regulations, e.g., European General Data Protection Regulation (GDPR). Mainstream state-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GANs), which are composed of a generator and a discriminator. While convolution neural networks are shown to be a better architecture than fully connected networks for tabular data synthesizing, two key properties of tabular data are overlooked: (i) the global correlation across columns, and (ii) invariant synthesizing to column permutations of input data. To address the above problems, we propose a Fourier conditional tabular generative adversarial network (FCT-GAN). We introduce feature tokenization and Fourier networks to construct a transformer-style generator and discriminator, and capture both local and global dependencies across columns. The tokenizer captures local spatial features and transforms original data into tokens. Fourier networks transform tokens to frequency domains and element-wisely multiply a learnable filter. Extensive evaluation on benchmarks and real-world data shows that FCT-GAN can synthesize tabular data with high machine learning utility (up to 27.8% better than state-of-the-art baselines) and high statistical similarity to the original data (up to 26.5% better), while maintaining the global correlation across columns, especially on high dimensional dataset.

Related papers

LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion [49.898152180805454]
Synthetic datasets must maintain domain-specific logical consistency.<n>Existing generative models often overlook these inter-column relationships.<n>This study presents the first method to effectively preserve inter-column relationships without requiring domain knowledge.
arXiv Detail & Related papers (2025-03-04T00:47:52Z)
Variable-size Symmetry-based Graph Fourier Transforms for image compression [65.7352685872625]
We propose a new family of Symmetry-based Graph Fourier Transforms of variable sizes into a coding framework. Our proposed algorithm generates symmetric graphs on the grid by adding specific symmetrical connections between nodes. Experiments show that SBGFTs outperform the primary transforms integrated in the explicit Multiple Transform Selection.
arXiv Detail & Related papers (2024-11-24T13:00:44Z)
Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable. We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data. Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z)
TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z)
Fake It Till Make It: Federated Learning with Consensus-Oriented Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG) FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training. Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z)
Rethinking Pre-Training in Tabular Data: A Neighborhood Embedding Perspective [71.45945607871715]
We propose Tabular data Pre-Training via Meta-representation (TabPTM) The core idea is to embed data instances into a shared feature space, where each instance is represented by its distance to a fixed number of nearest neighbors and their labels. Extensive experiments on 101 datasets confirm TabPTM's effectiveness in both classification and regression tasks, with and without fine-tuning.
arXiv Detail & Related papers (2023-10-31T18:03:54Z)
Permutation-Invariant Tabular Data Synthesis [14.55825097637513]
We show that changing the input column order worsens the statistical difference between real and synthetic data by up to 38.67%. We propose AE-GAN, a synthesizer that uses an autoencoder network to represent the tabular data and GAN networks to synthesize the latent representation. We evaluate the proposed solutions on five datasets in terms of the sensitivity to the column permutation, the quality of synthetic data, and the utility in downstream analyses.
arXiv Detail & Related papers (2022-11-17T01:14:19Z)
Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive. Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z)
Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data [8.014848609114154]
We propose Fed-TGAN, the first Federated learning framework for Tabular GANs. To effectively learn a complex GAN on non-identical participants, Fed-TGAN designs two novel features. Results show that Fed-TGAN accelerates training time per epoch up to 200%.
arXiv Detail & Related papers (2021-08-18T01:47:36Z)
Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z)
CTAB-GAN: Effective Table Data Synthesizing [7.336728307626645]
We develop CTAB-GAN, a conditional table GAN architecture that can model diverse data types. We show that CTAB-GAN remarkably resembles the real data for all three types of variables and results into higher accuracy for five machine learning algorithms, by up 17%.
arXiv Detail & Related papers (2021-02-16T18:53:57Z)
Tabular Transformers for Modeling Multivariate Time Series [30.717890753132824]
Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can leverage their hierarchical structure. We demonstrate our models on two datasets: a synthetic credit card transaction dataset, where the learned representations are used for fraud detection and synthetic data generation, and on a real pollution dataset, where the learned encodings are used to predict atmospheric pollutant concentrations.
arXiv Detail & Related papers (2020-11-03T16:58:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.