Related papers: CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data

CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data

URL: http://arxiv.org/abs/2602.11015v1
Date: Wed, 11 Feb 2026 16:39:07 GMT
Title: CVPL: A Geometric Framework for Post-Hoc Linkage Risk Assessment in Protected Tabular Data
Authors: Valery Khvatov, Alexey Neyman,
Abstract summary: Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets.<n>CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation.<n> Empirical validation on 10,000 records across 19 configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Formal privacy metrics provide compliance-oriented guarantees but often fail to quantify actual linkability in released datasets. We introduce CVPL (Cluster-Vector-Projection Linkage), a geometric framework for post-hoc assessment of linkage risk between original and protected tabular data. CVPL represents linkage analysis as an operator pipeline comprising blocking, vectorization, latent projection, and similarity evaluation, yielding continuous, scenario-dependent risk estimates rather than binary compliance verdicts. We formally define CVPL under an explicit threat model and introduce threshold-aware risk surfaces, R(lambda, tau), that capture the joint effects of protection strength and attacker strictness. We establish a progressive blocking strategy with monotonicity guarantees, enabling anytime risk estimation with valid lower bounds. We demonstrate that the classical Fellegi-Sunter linkage emerges as a special case of CVPL under restrictive assumptions, and that violations of these assumptions can lead to systematic over-linking bias. Empirical validation on 10,000 records across 19 protection configurations demonstrates that formal k-anonymity compliance may coexist with substantial empirical linkability, with a significant portion arising from non-quasi-identifier behavioral patterns. CVPL provides interpretable diagnostics identifying which features drive linkage feasibility, supporting privacy impact assessment, protection mechanism comparison, and utility-risk trade-off analysis.

Related papers

RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata [0.0]
We introduce a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model.<n>An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers.<n>For continuous sensitive attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone.
arXiv Detail & Related papers (2026-02-09T22:03:11Z)
Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts [51.37000123503367]
We propose prediction-powered risk monitoring (PPRM), a semi-supervised risk-monitoring approach based on prediction-powered inference (PPI)<n>PPRM constructs anytime-valid lower bounds on the running risk by combining synthetic labels with a small set of true labels.<n>We demonstrate the effectiveness of PPRM through extensive experiments on image classification, large language model (LLM) and telecommunications monitoring tasks.
arXiv Detail & Related papers (2026-02-02T15:32:14Z)
Quantifying the Risk of Transferred Black Box Attacks [0.0]
Neural networks have become pervasive across various applications, including security-related products.<n>This paper investigates the complexities involved in resilience testing against transferred adversarial attacks.<n>We propose a targeted resilience testing framework that employs surrogate models strategically selected based on Centered Kernel Alignment (CKA) similarity.
arXiv Detail & Related papers (2025-11-07T09:34:43Z)
Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy [24.723577119566112]
We show that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks.<n>Our results are tighter than prior methods using $varepsilon$-DP, R'enyi DP, and concentrated DP.
arXiv Detail & Related papers (2025-07-09T15:59:30Z)
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
Verification-Guided Falsification for Safe RL via Explainable Abstraction and Risk-Aware Exploration [8.246285288584625]
We propose a hybrid framework that integrates explainability, model checking, and risk-guided falsification to achieve both rigor and coverage.<n>Our approach begins by constructing a human-interpretable abstraction of the RL policy using Comprehensible Abstract Policy Summarization (CAPS)<n>If no violation is detected, we cannot conclude satisfaction due to potential limitation in the abstraction and coverage of the offline dataset.
arXiv Detail & Related papers (2025-06-04T00:54:01Z)
Conditional Conformal Risk Adaptation [9.559062601251464]
We develop a new score function for creating adaptive prediction sets that significantly improve conditional risk control for segmentation tasks.<n>We introduce a specialized probability calibration framework that enhances the reliability of pixel-wise inclusion estimates.<n>Our experiments on polyp segmentation demonstrate that all three methods provide valid marginal risk control and deliver more consistent conditional risk control.
arXiv Detail & Related papers (2025-04-10T10:01:06Z)
Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z)
Certifiably Byzantine-Robust Federated Conformal Prediction [49.23374238798428]
We introduce a novel framework Rob-FCP, which executes robust federated conformal prediction effectively countering malicious clients. We empirically demonstrate the robustness of Rob-FCP against diverse proportions of malicious clients under a variety of Byzantine attacks.
arXiv Detail & Related papers (2024-06-04T04:43:30Z)
From Mean to Extreme: Formal Differential Privacy Bounds on the Success of Real-World Data Reconstruction Attacks [54.25638567385662]
Differential Privacy in machine learning is often interpreted as guarantees against membership inference.<n> translating DP budgets into quantitative protection against the more damaging threat of data reconstruction remains a challenging open problem.<n>This paper bridges the critical gap by deriving the first formal privacy bounds tailored to the mechanics of demonstrated "from-scratch" attacks.
arXiv Detail & Related papers (2024-02-20T09:52:30Z)
Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm. Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.