Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow
- URL: http://arxiv.org/abs/2406.15762v1
- Date: Sat, 22 Jun 2024 06:59:32 GMT
- Title: Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow
- Authors: Zhichao Chen, Haoxuan Li, Fangyikang Wang, Odin Zhang, Hu Xu, Xiaoyu Jiang, Zhihuan Song, Eric H. Wang,
- Abstract summary: We introduce a principled approach termed Kernelized Negative Entropy-regularized Wasserstein gradient flow Imputation (KnewImp)
Our proposed KnewImp approach significantly outperforms existing state-of-the-art methods.
- Score: 13.109101873881063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within the realm of numerical tabular datasets, we introduce a novel principled approach termed Kernelized Negative Entropy-regularized Wasserstein gradient flow Imputation (KnewImp). Specifically, based on Wasserstein gradient flow (WGF) framework, we first prove that issue (1) stems from the cost functionals implicitly maximized in DM-based MDI are equivalent to the MDI's objective plus diversification-promoting non-negative terms. Based on this, we then design a novel cost functional with diversification-discouraging negative entropy and derive our KnewImp approach within WGF framework and reproducing kernel Hilbert space. After that, we prove that the imputation procedure of KnewImp can be derived from another cost functional related to the joint distribution, eliminating the need for the mask matrix and hence naturally addressing issue (2). Extensive experiments demonstrate that our proposed KnewImp approach significantly outperforms existing state-of-the-art methods.
Related papers
- A local squared Wasserstein-2 method for efficient reconstruction of models with uncertainty [0.0]
We propose a local squared Wasserstein-2 (W_2) method to solve the inverse problem of reconstructing models with uncertain latent variables or parameters.
A key advantage of our approach is that it does not require prior information on the distribution of the latent variables or parameters in the underlying models.
arXiv Detail & Related papers (2024-06-10T22:15:55Z) - Deep Latent Force Models: ODE-based Process Convolutions for Bayesian
Deep Learning [0.0]
The deep latent force model (DLFM) is a deep Gaussian process with physics-informed kernels at each layer.
We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world time series data.
We find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models.
arXiv Detail & Related papers (2023-11-24T19:55:57Z) - Mirror Diffusion Models for Constrained and Watermarked Generation [41.27274841596343]
Mirror Diffusion Models (MDM) is a new class of diffusion models that generate data on convex constrained sets without losing tractability.
For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information in generated data.
Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains.
arXiv Detail & Related papers (2023-10-02T14:26:31Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts.
This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs.
It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z) - Eliminating Lipschitz Singularities in Diffusion Models [51.806899946775076]
We show that diffusion models frequently exhibit the infinite Lipschitz near the zero point of timesteps.
This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations.
We propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz of the diffusion model near zero.
arXiv Detail & Related papers (2023-06-20T03:05:28Z) - Reconstructing Graph Diffusion History from a Single Snapshot [87.20550495678907]
We propose a novel barycenter formulation for reconstructing Diffusion history from A single SnapsHot (DASH)
We prove that estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation.
We also develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO)
arXiv Detail & Related papers (2023-06-01T09:39:32Z) - Uncertainty quantification of two-phase flow in porous media via
coupled-TgNN surrogate model [6.705438773768439]
Uncertainty quantification (UQ) of subsurface two-phase flow usually requires numerous executions of forward simulations under varying conditions.
In this work, a novel coupled theory-guided neural network (TgNN) based surrogate model is built to facilitate efficiency under the premise of satisfactory accuracy.
arXiv Detail & Related papers (2022-05-28T02:33:46Z) - Log-based Sparse Nonnegative Matrix Factorization for Data
Representation [55.72494900138061]
Nonnegative matrix factorization (NMF) has been widely studied in recent years due to its effectiveness in representing nonnegative data with parts-based representations.
We propose a new NMF method with log-norm imposed on the factor matrices to enhance the sparseness.
A novel column-wisely sparse norm, named $ell_2,log$-(pseudo) norm, is proposed to enhance the robustness of the proposed method.
arXiv Detail & Related papers (2022-04-22T11:38:10Z) - Inverting brain grey matter models with likelihood-free inference: a
tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI.
We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells.
We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z) - Low-rank statistical finite elements for scalable model-data synthesis [0.8602553195689513]
statFEM acknowledges a priori model misspecification, by embedding forcing within the governing equations.
The method reconstructs the observed data-generating processes with minimal loss of information.
This article overcomes this hurdle by embedding a low-rank approximation of the underlying dense covariance matrix.
arXiv Detail & Related papers (2021-09-10T09:51:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.