Related papers: Theoretical Analysis of Measure Consistency Regularization for Partially Observed Data

Theoretical Analysis of Measure Consistency Regularization for Partially Observed Data

URL: http://arxiv.org/abs/2602.01437v1
Date: Sun, 01 Feb 2026 21:03:42 GMT
Title: Theoretical Analysis of Measure Consistency Regularization for Partially Observed Data
Authors: Yinsong Wang, Shahin Shahrampour,
Abstract summary: Measure Consistency Regularization (MCR) methods enforce consistency between imputed and fully observed data.<n>This paper offers theoretical insights into why, when, and how MCR enhances imputation quality under partial observability.<n>We propose a novel training protocol that monitors the duality gap to determine an early stopping point that preserves the generalization benefit.
Score: 11.201029351368092
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The problem of corrupted data, missing features, or missing modalities continues to plague the modern machine learning landscape. To address this issue, a class of regularization methods that enforce consistency between imputed and fully observed data has emerged as a promising approach for improving model generalization, particularly in partially observed settings. We refer to this class of methods as Measure Consistency Regularization (MCR). Despite its empirical success in various applications, such as image inpainting, data imputation and semi-supervised learning, a fundamental understanding of the theoretical underpinnings of MCR remains limited. This paper bridges this gap by offering theoretical insights into why, when, and how MCR enhances imputation quality under partial observability, viewed through the lens of neural network distance. Our theoretical analysis identifies the term responsible for MCR's generalization advantage and extends to the imperfect training regime, demonstrating that this advantage is not always guaranteed. Guided by these insights, we propose a novel training protocol that monitors the duality gap to determine an early stopping point that preserves the generalization benefit. We then provide detailed empirical evidence to support our theoretical claims and to show the effectiveness and accuracy of our proposed stopping condition. We further provide a set of real-world data simulations to show the versatility of MCR under different model architectures designed for different data sources.

Related papers

A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation [12.350330523619336]
This paper develops a novel method that delivers a valid statistical inference framework for general Z-estimation problems.<n>The core technical idea is to stratify observations by distinct missingness patterns and construct an estimator by appropriately weighting and aggregating pattern-specific information.<n>We provide theoretical guarantees of normality of the proposed estimator and efficiency dominance over weighted complete-case analyses.
arXiv Detail & Related papers (2025-08-21T01:59:59Z)
MIRRAMS: Learning Robust Tabular Models under Unseen Missingness Shifts [2.5357049657770516]
Missing values often reflect variations in data collection policies, which may shift across time or locations.<n>Such shifts in the missingness distribution between training and test inputs pose a significant challenge to achieving robust predictive performance.<n>We propose a novel deep learning framework designed to address this challenge, particularly in the common yet challenging scenario where the test-time dataset is unseen.
arXiv Detail & Related papers (2025-07-11T03:03:30Z)
Model Reprogramming Demystified: A Neural Tangent Kernel Perspective [49.42322600160337]
We present a comprehensive theoretical analysis of Model Reprogramming (MR) through the lens of the Neural Tangent Kernel (NTK) framework.<n>We demonstrate that the success of MR is governed by the eigenvalue spectrum of the NTK matrix on the target dataset.<n>Our contributions include a novel theoretical framework for MR, insights into the relationship between source and target models, and extensive experiments validating our findings.
arXiv Detail & Related papers (2025-05-31T16:15:04Z)
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking [50.465604300990904]
Grokking refers to the abrupt improvement in test accuracy after extended overfitting.<n>In this work, we investigate the grokking mechanism underlying the Transformer in the task of prime number operations.
arXiv Detail & Related papers (2025-04-04T04:42:38Z)
Global Convergence of Continual Learning on Non-IID Data [51.99584235667152]
We provide a general and comprehensive theoretical analysis for continual learning of regression models.<n>We establish the almost sure convergence results of continual learning under a general data condition for the first time.
arXiv Detail & Related papers (2025-03-24T10:06:07Z)
Revisiting Spurious Correlation in Domain Generalization [12.745076668687748]
We build a structural causal model (SCM) to describe the causality within data generation process. We further conduct a thorough analysis of the mechanisms underlying spurious correlation. In this regard, we propose to control confounding bias in OOD generalization by introducing a propensity score weighted estimator.
arXiv Detail & Related papers (2024-06-17T13:22:00Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Matrix Completion-Informed Deep Unfolded Equilibrium Models for Self-Supervised k-Space Interpolation in MRI [8.33626757808923]
Regularization model-driven deep learning (DL) has gained significant attention due to its ability to leverage the potent representational capabilities of DL. We propose a self-supervised DL approach for accelerated MRI that is theoretically guaranteed and does not rely on fully sampled labels.
arXiv Detail & Related papers (2023-09-24T07:25:06Z)
Modeling Multiple Views via Implicitly Preserving Global Consistency and Local Complementarity [61.05259660910437]
We propose a global consistency and complementarity network (CoCoNet) to learn representations from multiple views. On the global stage, we reckon that the crucial knowledge is implicitly shared among views, and enhancing the encoder to capture such knowledge can improve the discriminability of the learned representations. Lastly on the local stage, we propose a complementarity-factor, which joints cross-view discriminative knowledge, and it guides the encoders to learn not only view-wise discriminability but also cross-view complementary information.
arXiv Detail & Related papers (2022-09-16T09:24:00Z)
Provable Generalization of Overparameterized Meta-learning Trained with SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML) We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z)
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval [19.2650103482509]
Cross-Modal Retrieval (CMR) is an important research topic across multimodal computing and information retrieval. We take CLIP as the current representative vision-language pre-trained model to conduct a comprehensive empirical study. We propose a novel model CLIP4CMR that employs pre-trained CLIP as backbone network to perform supervised CMR.
arXiv Detail & Related papers (2022-01-08T06:00:22Z)
Which Mutual-Information Representation Learning Objectives are Sufficient for Control? [80.2534918595143]
Mutual information provides an appealing formalism for learning representations of data. This paper formalizes the sufficiency of a state representation for learning and representing the optimal policy. Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP.
arXiv Detail & Related papers (2021-06-14T10:12:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.