DeepLeak: Privacy Enhancing Hardening of Model Explanations Against Membership Leakage
- URL: http://arxiv.org/abs/2601.03429v1
- Date: Tue, 06 Jan 2026 21:34:27 GMT
- Title: DeepLeak: Privacy Enhancing Hardening of Model Explanations Against Membership Leakage
- Authors: Firas Ben Hmida, Zain Sbeih, Philemon Hailemariam, Birhanu Eshete,
- Abstract summary: DeepLeak is a system to audit and mitigate privacy risks in post-hoc explanation methods.<n>We show that default settings can leak up to 74.9% more membership information than previously reported.<n>Our mitigations cut leakage by up to 95% (minimum 46.5%) with only =3.3% utility loss on average.
- Score: 1.096626056612224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) explainability is central to algorithmic transparency in high-stakes settings such as predictive diagnostics and loan approval. However, these same domains require rigorous privacy guaranties, creating tension between interpretability and privacy. Although prior work has shown that explanation methods can leak membership information, practitioners still lack systematic guidance on selecting or deploying explanation techniques that balance transparency with privacy. We present DeepLeak, a system to audit and mitigate privacy risks in post-hoc explanation methods. DeepLeak advances the state-of-the-art in three ways: (1) comprehensive leakage profiling: we develop a stronger explanation-aware membership inference attack (MIA) to quantify how much representative explanation methods leak membership information under default configurations; (2) lightweight hardening strategies: we introduce practical, model-agnostic mitigations, including sensitivity-calibrated noise, attribution clipping, and masking, that substantially reduce membership leakage while preserving explanation utility; and (3) root-cause analysis: through controlled experiments, we pinpoint algorithmic properties (e.g., attribution sparsity and sensitivity) that drive leakage. Evaluating 15 explanation techniques across four families on image benchmarks, DeepLeak shows that default settings can leak up to 74.9% more membership information than previously reported. Our mitigations cut leakage by up to 95% (minimum 46.5%) with only <=3.3% utility loss on average. DeepLeak offers a systematic, reproducible path to safer explainability in privacy-sensitive ML.
Related papers
- Privacy-Preserving Explainable AIoT Application via SHAP Entropy Regularization [5.35811141279537]
We propose a privacy-preserving approach based on SHAP entropy regularization to mitigate privacy leakage in explainable AIoT applications.<n>We develop a suite of SHAP-based privacy attacks that strategically leverage model explanation outputs to infer sensitive information.
arXiv Detail & Related papers (2025-11-12T22:13:32Z) - DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models [55.30555646945055]
Text-to-Image (T2I) models are vulnerable to semantic leakage.<n>We introduce DeLeaker, a lightweight approach that mitigates leakage by directly intervening on the model's attention maps.<n>SLIM is the first dataset dedicated to semantic leakage.
arXiv Detail & Related papers (2025-10-16T17:39:21Z) - Conformal Prediction for Privacy-Preserving Machine Learning [83.88591755871734]
Using AES-encrypted variants of the MNIST dataset, we demonstrate that Conformal Prediction methods remain effective even when applied directly in the encrypted domain.<n>Our work sets a foundation for principled uncertainty quantification in secure, privacy-aware learning systems.
arXiv Detail & Related papers (2025-07-13T15:29:14Z) - A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z) - PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders [8.483679748399037]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but pose privacy risks by memorizing and leaking Personally Identifiable Information (PII)<n>Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage.<n>We introduce PrivacyScalpel, a novel privacy-preserving framework that leverages interpretability techniques to identify and mitigate PII leakage while maintaining performance.
arXiv Detail & Related papers (2025-03-14T09:31:01Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.<n>We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Analyzing Leakage of Personally Identifiable Information in Language
Models [13.467340359030855]
Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks.
Scrubbing techniques reduce but do not prevent the risk of PII leakage.
It is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee user-level privacy, prevent PII disclosure.
arXiv Detail & Related papers (2023-02-01T16:04:48Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - On the amplification of security and privacy risks by post-hoc
explanations in machine learning models [7.564511776742979]
Post-hoc explanation methods that highlight input dimensions according to their importance or relevance to the result also leak information that weakens security and privacy.
We propose novel explanation-guided black-box evasion attacks that lead to 10 times reduction in query count for the same success rate.
We show that the adversarial advantage from explanations can be quantified as a reduction in the total variance of the estimated gradient.
arXiv Detail & Related papers (2022-06-28T13:46:06Z) - Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity.
We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach.
We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.