CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data
- URL: http://arxiv.org/abs/2602.11015v1
- Date: Wed, 11 Feb 2026 16:39:07 GMT
- Title: CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data
- Authors: Valery Khvatov, Alexey Neyman,
- Abstract summary: Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets.<n>CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation.<n> Empirical validation on 10,000 records across 19 configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets. We introduce CVPL (Cluster-Vector-Projection Linkage), a geometric framework for post-hoc assessment of linkage risk between original and protected tabular data. CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation, yielding continuous, scenario-dependent risk estimates rather than binary compliance verdicts. We formally define CVPL under an explicit threat model and introduce threshold-aware risk surfaces, R(lambda, tau), that capture the joint effects of protection strength and attacker strictness. We establish a progressive blocking strategy with monotonicity guarantees, enabling anytime risk estimation with valid lower bounds. We demonstrate that the classical Fellegi-Sunter linkage emerges as a special case of CVPL under restrictive assumptions, and that violations of these assumptions can lead to systematic over-linking bias. Empirical validation on 10,000 records across 19 protection configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability, with a significant portion arising from non-quasi-identifier behavioral patterns. CVPL provides interpretable diagnostics identifying which features drive linkage feasibility, supporting privacy impact assessment, protection mechanism comparison, and utility-risk trade-off analysis.
Related papers
- RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata [0.0]
We introduce a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model.<n>An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers.<n>For continuous sensitive attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone.
arXiv Detail & Related papers (2026-02-09T22:03:11Z) - Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts [51.37000123503367]
We propose prediction-powered risk monitoring (PPRM), a semi-supervised risk-monitoring approach based on prediction-powered inference (PPI)<n>PPRM constructs anytime-valid lower bounds on the running risk by combining synthetic labels with a small set of true labels.<n>We demonstrate the effectiveness of PPRM through extensive experiments on image classification, large language model (LLM) and telecommunications monitoring tasks.
arXiv Detail & Related papers (2026-02-02T15:32:14Z) - Quantifying the Risk of Transferred Black Box Attacks [0.0]
Neural networks have become pervasive across various applications, including security-related products.<n>This paper investigates the complexities involved in resilience testing against transferred adversarial attacks.<n>We propose a targeted resilience testing framework that employs surrogate models strategically selected based on Centered Kernel Alignment (CKA) similarity.
arXiv Detail & Related papers (2025-11-07T09:34:43Z) - Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy [24.723577119566112]
We show that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks.<n>Our results are tighter than prior methods using $varepsilon$-DP, R'enyi DP, and concentrated DP.
arXiv Detail & Related papers (2025-07-09T15:59:30Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - Verification-Guided Falsification for Safe RL via Explainable Abstraction and Risk-Aware Exploration [8.246285288584625]
We propose a hybrid framework that integrates explainability, model checking, and risk-guided falsification to achieve both rigor and coverage.<n>Our approach begins by constructing a human-interpretable abstraction of the RL policy using Comprehensible Abstract Policy Summarization (CAPS)<n>If no violation is detected, we cannot conclude satisfaction due to potential limitation in the abstraction and coverage of the offline dataset.
arXiv Detail & Related papers (2025-06-04T00:54:01Z) - Conditional Conformal Risk Adaptation [9.559062601251464]
We develop a new score function for creating adaptive prediction sets that significantly improve conditional risk control for segmentation tasks.<n>We introduce a specialized probability calibration framework that enhances the reliability of pixel-wise inclusion estimates.<n>Our experiments on polyp segmentation demonstrate that all three methods provide valid marginal risk control and deliver more consistent conditional risk control.
arXiv Detail & Related papers (2025-04-10T10:01:06Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Certifiably Byzantine-Robust Federated Conformal Prediction [49.23374238798428]
We introduce a novel framework Rob-FCP, which executes robust federated conformal prediction effectively countering malicious clients.
We empirically demonstrate the robustness of Rob-FCP against diverse proportions of malicious clients under a variety of Byzantine attacks.
arXiv Detail & Related papers (2024-06-04T04:43:30Z) - From Mean to Extreme: Formal Differential Privacy Bounds on the Success of Real-World Data Reconstruction Attacks [54.25638567385662]
Differential Privacy in machine learning is often interpreted as guarantees against membership inference.<n> translating DP budgets into quantitative protection against the more damaging threat of data reconstruction remains a challenging open problem.<n>This paper bridges the critical gap by deriving the first formal privacy bounds tailored to the mechanics of demonstrated "from-scratch" attacks.
arXiv Detail & Related papers (2024-02-20T09:52:30Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.