Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
- URL: http://arxiv.org/abs/2509.23443v1
- Date: Sat, 27 Sep 2025 18:23:21 GMT
- Title: Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
- Authors: Wenhao Yang, Lin Li, Xiaohui Tao, Kaize Shi,
- Abstract summary: The imperative of user privacy protection and regulatory compliance necessitates sensitive data removal in model training.<n>We propose a novel data removal approach that enhances deep predictive models through factor decorrelation and loss perturbation.
- Score: 19.35464160096721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The imperative of user privacy protection and regulatory compliance necessitates sensitive data removal in model training, yet this process often induces distributional shifts that undermine model performance-particularly in out-of-distribution (OOD) scenarios. We propose a novel data removal approach that enhances deep predictive models through factor decorrelation and loss perturbation. Our approach introduces: (1) a discriminative-preserving factor decorrelation module employing dynamic adaptive weight adjustment and iterative representation updating to reduce feature redundancy and minimize inter-feature correlations. (2) a smoothed data removal mechanism with loss perturbation that creates information-theoretic safeguards against data leakage during removal operations. Extensive experiments on five benchmark datasets show that our approach outperforms other baselines and consistently achieves high predictive accuracy and robustness even under significant distribution shifts. The results highlight its superior efficiency and adaptability in both in-distribution and out-of-distribution scenarios.
Related papers
- Multi-environment Invariance Learning with Missing Data [0.0]
In this work, we establish non-asymptotic guarantees on variable selection property and $ell$ error convergence rates.<n>We evaluate the performance of the new estimator through extensive simulations and demonstrate its application using the UCI Bike Sharing dataset.
arXiv Detail & Related papers (2026-01-12T06:30:58Z) - Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z) - Filtering with Confidence: When Data Augmentation Meets Conformal Prediction [10.431187957711206]
We propose a principled data filtering framework for synthetic data augmentation.<n>We show consistent performance improvements of up to 40% in F1 score over unaugmented baselines.
arXiv Detail & Related papers (2025-09-25T19:42:02Z) - Distributionally Robust Optimization with Adversarial Data Contamination [49.89480853499918]
We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions.<n>Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts.<n>This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
arXiv Detail & Related papers (2025-07-14T18:34:10Z) - Robust Molecular Property Prediction via Densifying Scarce Labeled Data [53.24886143129006]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.
arXiv Detail & Related papers (2025-06-13T15:27:40Z) - A Conformal Approach to Feature-based Newsvendor under Model Misspecification [2.801095519296785]
We propose a model-free and distribution-free framework inspired by conformal prediction.<n>We validate our framework using both simulated data and a real-world dataset from the Capital Bikeshare program in Washington, D.C.
arXiv Detail & Related papers (2024-12-17T18:34:43Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - DRoP: Distributionally Robust Data Pruning [11.930434318557156]
We conduct the first systematic study of the impact of data pruning on classification bias of trained models.<n>We propose DRoP, a distributionally robust approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks.
arXiv Detail & Related papers (2024-04-08T14:55:35Z) - Causality-oriented robustness: exploiting general noise interventions [4.64479351797195]
In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG)<n>DRIG exploits general noise interventions in training data for robust predictions against unseen interventions.<n>We show that our framework includes anchor regression as a special case, and that it yields prediction models that protect against more diverse perturbations.
arXiv Detail & Related papers (2023-07-18T16:22:50Z) - DeepVol: Volatility Forecasting from High-Frequency Data with Dilated Causal Convolutions [53.37679435230207]
We propose DeepVol, a model based on Dilated Causal Convolutions that uses high-frequency data to forecast day-ahead volatility.
Our empirical results suggest that the proposed deep learning-based approach effectively learns global features from high-frequency data.
arXiv Detail & Related papers (2022-09-23T16:13:47Z) - Which Invariance Should We Transfer? A Causal Minimax Learning Approach [18.71316951734806]
We present a comprehensive minimax analysis from a causal perspective.
We propose an efficient algorithm to search for the subset with minimal worst-case risk.
The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease.
arXiv Detail & Related papers (2021-07-05T09:07:29Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.