Impute-MACFM: Imputation based on Mask-Aware Flow Matching
- URL: http://arxiv.org/abs/2509.23126v1
- Date: Sat, 27 Sep 2025 05:15:09 GMT
- Title: Impute-MACFM: Imputation based on Mask-Aware Flow Matching
- Authors: Dengyi Liu, Honggang Wang, Hua Fang,
- Abstract summary: Impute-MACFM is a conditional flow matching framework for tabular imputation.<n>It addresses missingness mechanisms, missing completely at random, missing at random, and missing not at random.<n>It builds trajectories only on missing entries while constraining predicted velocity to remain near zero on observed entries.
- Score: 1.9483189922830135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tabular data are central to many applications, especially longitudinal data in healthcare, where missing values are common, undermining model fidelity and reliability. Prior imputation methods either impose restrictive assumptions or struggle with complex cross-feature structure, while recent generative approaches suffer from instability and costly inference. We propose Impute-MACFM, a mask-aware conditional flow matching framework for tabular imputation that addresses missingness mechanisms, missing completely at random, missing at random, and missing not at random. Its mask-aware objective builds trajectories only on missing entries while constraining predicted velocity to remain near zero on observed entries, using flexible nonlinear schedules. Impute-MACFM combines: (i) stability penalties on observed positions, (ii) consistency regularization enforcing local invariance, and (iii) time-decayed noise injection for numeric features. Inference uses constraint-preserving ordinary differential equation integration with per-step projection to fix observed values, optionally aggregating multiple trajectories for robustness. Across diverse benchmarks, Impute-MACFM achieves state-of-the-art results while delivering more robust, efficient, and higher-quality imputation than competing approaches, establishing flow matching as a promising direction for tabular missing-data problems, including longitudinal data.
Related papers
- MissHDD: Hybrid Deterministic Diffusion for Hetrogeneous Incomplete Data Imputation [4.935498694293104]
We propose a hybrid deterministic diffusion framework that separates heterogeneous features into two complementary generative channels.<n>A continuous DDIM-based channel provides efficient and stable deterministic denoising for numerical variables.<n>A discrete latent-path diffusion channel, inspired by loopholing-based discrete diffusion, models categorical and discrete features without leaving their valid sample.<n>The two channels are trained under a unified conditional imputation objective, enabling coherent reconstruction of mixed-type incomplete data.
arXiv Detail & Related papers (2025-11-18T14:44:49Z) - Revisiting Multivariate Time Series Forecasting with Missing Values [74.56971641937771]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation [2.124791625488617]
We present MissDDIM, a conditional diffusion framework that adapts Denoising Diffusion Implicit Models (DDIM) for tabular imputation.<n>While sampling enables diverse completions, it also introduces output variability that complicates downstream processing.
arXiv Detail & Related papers (2025-08-05T04:55:26Z) - Error-quantified Conformal Inference for Time Series [55.11926160774831]
Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data.<n>We propose itError-quantified Conformal Inference (ECI) by smoothing the quantile loss function.<n>ECI can achieve valid miscoverage control and output tighter prediction sets than other baselines.
arXiv Detail & Related papers (2025-02-02T15:02:36Z) - BRATI: Bidirectional Recurrent Attention for Time-Series Imputation [0.14999444543328289]
Missing data in time-series analysis poses significant challenges, affecting the reliability of downstream applications.<n>This paper introduces BRATI, a novel deep-learning model designed to address multivariate time-series imputation.<n>BRATI processes temporal dependencies and feature correlations across long and short time horizons, utilizing two imputation blocks that operate in opposite temporal directions.
arXiv Detail & Related papers (2025-01-09T17:50:56Z) - MTSCI: A Conditional Diffusion Model for Multivariate Time Series Consistent Imputation [41.681869408967586]
Key research question is how to ensure imputation consistency, i.e., intra-consistency between observed and imputed values.
Previous methods rely solely on the inductive bias of the imputation targets to guide the learning process.
arXiv Detail & Related papers (2024-08-11T10:24:53Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - Uncertainty-Aware Deep Attention Recurrent Neural Network for
Heterogeneous Time Series Imputation [0.25112747242081457]
Missingness is ubiquitous in multivariate time series and poses an obstacle to reliable downstream analysis.
We propose DEep Attention Recurrent Imputation (Imputation), which jointly estimates missing values and their associated uncertainty.
Experiments show that I surpasses the SOTA in diverse imputation tasks using real-world datasets.
arXiv Detail & Related papers (2024-01-04T13:21:11Z) - It's All in the Mix: Wasserstein Classification and Regression with Mixed Features [5.106912532044251]
We develop and analyze distributionally robust prediction models that faithfully account for the presence of discrete features.<n>We demonstrate that our models can significantly outperform existing methods that are agnostic to the presence of discrete features.
arXiv Detail & Related papers (2023-12-19T15:15:52Z) - Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series [45.76310830281876]
We propose Quantile Sub-Ensembles, a novel method to estimate uncertainty with ensemble of quantile-regression-based task networks.
Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.
arXiv Detail & Related papers (2023-12-03T05:52:30Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference.
We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.