Missing Data Imputation by Reducing Mutual Information with Rectified Flows
- URL: http://arxiv.org/abs/2505.11749v2
- Date: Mon, 09 Jun 2025 16:46:35 GMT
- Title: Missing Data Imputation by Reducing Mutual Information with Rectified Flows
- Authors: Jiahao Yu, Qizhen Ying, Leyang Wang, Ziyue Jiang, Song Liu,
- Abstract summary: This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask.<n>Our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missing mask, and the product of their marginals from the previous iteration.<n>We show that the optimal imputation under this framework corresponds to solving an ODE, whose velocity field minimizes a rectified flow training objective.
- Score: 10.922921698547261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method explicitly targets the reduction of mutual information. Specifically, our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missing mask, and the product of their marginals from the previous iteration. We show that the optimal imputation under this framework corresponds to solving an ODE, whose velocity field minimizes a rectified flow training objective. We further illustrate that some existing imputation techniques can be interpreted as approximate special cases of our mutual-information-reducing framework. Comprehensive experiments on synthetic and real-world datasets validate the efficacy of our proposed approach, demonstrating superior imputation performance.
Related papers
- A Deep Bayesian Nonparametric Framework for Robust Mutual Information Estimation [9.68824512279232]
Mutual Information (MI) is a crucial measure for capturing dependencies between variables.<n>We present a solution for training an MI estimator by constructing the MI loss with a finite representation of the Dirichlet process posterior to incorporate regularization.<n>We explore the application of our estimator in maximizing MI between the data space and the latent space of a variational autoencoder.
arXiv Detail & Related papers (2025-03-11T21:27:48Z) - Enabling Tensor Decomposition for Time-Series Classification via A Simple Pseudo-Laplacian Contrast [26.28414569796961]
We propose a novel Pseudo Laplacian Contrast (PLC) tensor decomposition framework.
It integrates the data augmentation and cross-view Laplacian to enable the extraction of class-aware representations.
Experiments on various datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-23T16:48:13Z) - Amortized Posterior Sampling with Diffusion Prior Distillation [55.03585818289934]
Amortized Posterior Sampling is a novel variational inference approach for efficient posterior sampling in inverse problems.<n>Our method trains a conditional flow model to minimize the divergence between the variational distribution and the posterior distribution implicitly defined by the diffusion model.<n>Unlike existing methods, our approach is unsupervised, requires no paired training data, and is applicable to both Euclidean and non-Euclidean domains.
arXiv Detail & Related papers (2024-07-25T09:53:12Z) - Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models [20.550324116099357]
Diffusion models are known for their tremendous ability to generate novel and high-quality samples.<n>Recent approaches for memory mitigation either only focused on the text modality problem in cross-modal generation tasks or utilized data augmentation strategies.<n>We propose a novel training framework for diffusion models from the perspective of visual modality, which is more generic and fundamental for mitigating memorization.
arXiv Detail & Related papers (2024-07-22T02:19:30Z) - Disparate Impact on Group Accuracy of Linearization for Private Inference [48.27026603581436]
We show that reducing the number of ReLU activations disproportionately decreases the accuracy for minority groups compared to majority groups.
We also show how a simple procedure altering the fine-tuning step for linearized models can serve as an effective mitigation strategy.
arXiv Detail & Related papers (2024-02-06T01:56:29Z) - Erasing Undesirable Influence in Diffusion Models [51.225365010401006]
Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content.
In this work, we introduce EraseDiff, an algorithm designed to preserve the utility of the diffusion model on retained data while removing the unwanted information associated with the data to be forgotten.
arXiv Detail & Related papers (2024-01-11T09:30:36Z) - Adversarial contamination of networks in the setting of vertex
nomination: a new trimming method [5.915837770869619]
spectral graph embeddings provide good algorithmic performance and flexible settings.
We propose a new trimming method that operates in model space which can address both block structure contamination and white noise contamination.
This model trimming is more amenable to theoretical analysis while also demonstrating superior performance in a number of simulations.
arXiv Detail & Related papers (2022-08-20T15:32:04Z) - An Accelerated Doubly Stochastic Gradient Method with Faster Explicit
Model Identification [97.28167655721766]
We propose a novel doubly accelerated gradient descent (ADSGD) method for sparsity regularized loss minimization problems.
We first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity.
arXiv Detail & Related papers (2022-08-11T22:27:22Z) - Multiple Imputation via Generative Adversarial Network for
High-dimensional Blockwise Missing Value Problems [6.123324869194195]
We propose Multiple Imputation via Generative Adversarial Network (MI-GAN), a deep learning-based (in specific, a GAN-based) multiple imputation method.
MI-GAN shows strong performance matching existing state-of-the-art imputation methods on high-dimensional datasets.
In particular, MI-GAN significantly outperforms other imputation methods in the sense of statistical inference and computational speed.
arXiv Detail & Related papers (2021-12-21T20:19:37Z) - Influence Estimation and Maximization via Neural Mean-Field Dynamics [60.91291234832546]
We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems.
Our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities.
arXiv Detail & Related papers (2021-06-03T00:02:05Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data.
We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.