Related papers: Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks

Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks

URL: http://arxiv.org/abs/2512.09103v1
Date: Tue, 09 Dec 2025 20:40:27 GMT
Title: Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks
Authors: Shihao Li, Jiachen Li, Dongmei Chen,
Abstract summary: We present a unified framework for robust attribution that extends from convex models to deep networks.<n>For convex settings, we derive Wasserstein-Robust Influence Functions (W-RIF) with provable coverage guarantees.<n>For deep networks, we demonstrate that Euclidean certification is rendered vacuous by spectral amplification.
Score: 9.553350856191743
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data attribution methods identify which training examples are responsible for a model's predictions, but their sensitivity to distributional perturbations undermines practical reliability. We present a unified framework for certified robust attribution that extends from convex models to deep networks. For convex settings, we derive Wasserstein-Robust Influence Functions (W-RIF) with provable coverage guarantees. For deep networks, we demonstrate that Euclidean certification is rendered vacuous by spectral amplification -- a mechanism where the inherent ill-conditioning of deep representations inflates Lipschitz bounds by over $10{,}000\times$. This explains why standard TRAK scores, while accurate point estimates, are geometrically fragile: naive Euclidean robustness analysis yields 0\% certification. Our key contribution is the Natural Wasserstein metric, which measures perturbations in the geometry induced by the model's own feature covariance. This eliminates spectral amplification, reducing worst-case sensitivity by $76\times$ and stabilizing attribution estimates. On CIFAR-10 with ResNet-18, Natural W-TRAK certifies 68.7\% of ranking pairs compared to 0\% for Euclidean baselines -- to our knowledge, the first non-vacuous certified bounds for neural network attribution. Furthermore, we prove that the Self-Influence term arising from our analysis equals the Lipschitz constant governing attribution stability, providing theoretical grounding for leverage-based anomaly detection. Empirically, Self-Influence achieves 0.970 AUROC for label noise detection, identifying 94.1\% of corrupted labels by examining just the top 20\% of training data.

Related papers

BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning [73.46118996284888]
Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence.<n>We propose BadCLIP++, a unified framework that tackles both challenges.<n>For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions.<n>For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment.
arXiv Detail & Related papers (2026-02-19T08:31:16Z)
Behavioral Analytics for Continuous Insider Threat Detection in Zero-Trust Architectures [0.0]
This framework makes use of the CERT Insider Threat dataset for data cleaning, normalization, and class balance.<n>It also employs Principal Component Analysis (PCA) for dimensionality reduction.<n>Compared to SVM (90.1%), ANN (94.7%), and Bayes Net (94.9), AdaBoost achieved higher performance with a 98.0% ACC, 98.3% PRE, 98.0% REC, and F1-score (F1)
arXiv Detail & Related papers (2026-01-10T22:30:19Z)
Latent Sculpting for Zero-Shot Generalization: A Manifold Learning Approach to Out-of-Distribution Anomaly Detection [2.8547732086436306]
A fundamental limitation of supervised deep learning is "Generalization Collapse"<n>We propose Latent Sculpting, a hierarchical two-stage representation learning framework.<n>We report an 88.89% detection rate on "Infiltration" scenarios.
arXiv Detail & Related papers (2025-12-19T11:37:02Z)
The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks [51.468144272905135]
Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks.<n>We provide a theoretical analysis targeting backdoor attacks, focusing on how sparse decision boundaries enable disproportionate model manipulation.<n>We propose Eminence, an explainable and robust black-box backdoor framework with provable theoretical guarantees and inherent stealth properties.
arXiv Detail & Related papers (2025-12-11T08:09:07Z)
Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification [0.0]
This work bridges information geometry and statistical learning, offering formal guarantees for uncertainty-aware classification in applications requiring rigorous validation.<n> Empirical validation on Adeno-Associated Virus classification demonstrates that the two-stage framework captures 72.5% of errors while deferring 34.5% of samples, reducing automated decision error rates from 16.8% to 6.9%.
arXiv Detail & Related papers (2025-11-26T01:29:49Z)
CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction [48.82467166657901]
We propose the first certified dataset watermark (i.e., CertDW) and CertDW-based certified dataset ownership verification method.<n>Inspired by conformal prediction, we introduce two statistical measures, including principal probability (PP) and watermark robustness (WR)<n>We prove there exists a provable lower bound between PP and WR, enabling ownership verification when a suspicious model's WR value significantly exceeds the PP values of benign models trained on watermark-free datasets.
arXiv Detail & Related papers (2025-06-16T07:17:23Z)
TULiP: Test-time Uncertainty Estimation via Linearization and Weight Perturbation [11.334867025651233]
We propose TULiP, a theoretically-driven uncertainty estimator for OOD detection.<n>Our approach considers a hypothetical perturbation applied to the network before convergence.<n>Our method exhibits state-of-the-art performance, particularly for near-distribution samples.
arXiv Detail & Related papers (2025-05-22T17:16:41Z)
Fully Heteroscedastic Count Regression with Deep Double Poisson Networks [6.976150282812484]
Deep Double Poisson Network (DDPN) is a novel neural discrete count regression model.<n>We show that DDPN exhibits robust regression properties similar to heteroscedastic Gaussian models.<n>Experiments on diverse datasets demonstrate that DDPN outperforms current baselines in accuracy, calibration, and out-of-distribution detection.
arXiv Detail & Related papers (2024-06-13T16:02:03Z)
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal. Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths. Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z)
The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing [85.85160896547698]
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. We show how to design an efficient classifier with a certified radius by relying on noise injection into the inputs. Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
arXiv Detail & Related papers (2023-09-28T22:41:47Z)
Getting a-Round Guarantees: Floating-Point Attacks on Certified Robustness [19.380453459873298]
Adversarial examples pose a security risk as they can alter decisions of a machine learning classifier through slight input perturbations. We show that these guarantees can be invalidated due to limitations of floating-point representation that cause rounding errors. We show that the attack can be carried out against linear classifiers that have exact certifiable guarantees and against neural networks that have conservative certifications.
arXiv Detail & Related papers (2022-05-20T13:07:36Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.