Exponential Family Variational Flow Matching for Tabular Data Generation
- URL: http://arxiv.org/abs/2506.05940v4
- Date: Fri, 03 Oct 2025 17:32:46 GMT
- Title: Exponential Family Variational Flow Matching for Tabular Data Generation
- Authors: Andrés Guzmán-Cordero, Floor Eijkelboom, Jan-Willem van de Meent,
- Abstract summary: We develop TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation.<n>We introduce Exponential Family Variational Flow Matching (EF-VFM), which represents heterogeneous data types.<n>We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences.
- Score: 10.161936647987517
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While denoising diffusion and flow matching have driven major advances in generative modeling, their application to tabular data remains limited, despite its ubiquity in real-world applications. To this end, we develop TabbyFlow, a variational Flow Matching (VFM) method for tabular data generation. To apply VFM to data with mixed continuous and discrete features, we introduce Exponential Family Variational Flow Matching (EF-VFM), which represents heterogeneous data types using a general exponential family distribution. We hereby obtain an efficient, data-driven objective based on moment matching, enabling principled learning of probability paths over mixed continuous and discrete variables. We also establish a connection between variational flow matching and generalized flow matching objectives based on Bregman divergences. Evaluation on tabular data benchmarks demonstrates state-of-the-art performance compared to baselines.
Related papers
- Flow Matching for Tabular Data Synthesis [6.009900118732673]
Flow matching is an important tool for privacy-preserving data sharing.<n>This paper compares flow matching with a state-of-the-art diffusion method.<n>We find that flow matching, particularly TabbyFlow, outperforms diffusion baselines.
arXiv Detail & Related papers (2025-11-30T02:18:04Z) - TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model.<n>Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.<n>TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Generative Modeling of Discrete Joint Distributions by E-Geodesic Flow
Matching on Assignment Manifolds [0.8594140167290099]
General non-factorizing discrete distributions can be approximated by embedding the submanifold into a the meta-simplex of all joint discrete distributions.
Efficient training of the generative model is demonstrated by matching the flow of geodesics of factorizing discrete distributions.
arXiv Detail & Related papers (2024-02-12T17:56:52Z) - MissDiff: Training Diffusion Models on Tabular Data with Missing Values [29.894691645801597]
This work presents a unified and principled diffusion-based framework for learning from data with missing values.
We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective.
We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases.
arXiv Detail & Related papers (2023-07-02T03:49:47Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Normalizing Flow with Variational Latent Representation [20.038183566389794]
We propose a new framework based on variational latent representation to improve the practical performance of Normalizing Flow (NF)
The idea is to replace the standard normal latent variable with a more general latent representation, jointly learned via Variational Bayes.
The resulting method is significantly more powerful than the standard normalization flow approach for generating data distributions with multiple modes.
arXiv Detail & Related papers (2022-11-21T16:51:49Z) - Training Normalizing Flows from Dependent Data [31.42053454078623]
We propose a likelihood objective of normalizing flows incorporating dependencies between the data points.
We show that respecting dependencies between observations can improve empirical results on both synthetic and real-world data.
arXiv Detail & Related papers (2022-09-29T16:50:34Z) - Bayesian Structure Learning with Generative Flow Networks [85.84396514570373]
In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) from data.
Recently, a class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling.
We show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs.
arXiv Detail & Related papers (2022-02-28T15:53:10Z) - Semi-Supervised Learning with Normalizing Flows [54.376602201489995]
FlowGMM is an end-to-end approach to generative semi supervised learning with normalizing flows.
We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data.
arXiv Detail & Related papers (2019-12-30T17:36:33Z) - Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference.
We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.