Information Leakage Detection through Approximate Bayes-optimal Prediction
- URL: http://arxiv.org/abs/2401.14283v2
- Date: Mon, 29 Jul 2024 18:46:50 GMT
- Title: Information Leakage Detection through Approximate Bayes-optimal Prediction
- Authors: Pritha Gupta, Marcel Wever, Eyke Hüllermeier,
- Abstract summary: Information leakage (IL) involves unintentionally exposing sensitive information to unauthorized parties.
Conventional statistical approaches rely on estimating mutual information between observable and secret information for detecting ILs.
We establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately.
- Score: 22.04308347355652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In today's data-driven world, the proliferation of publicly available information raises security concerns due to the information leakage (IL) problem. IL involves unintentionally exposing sensitive information to unauthorized parties via observable system information. Conventional statistical approaches rely on estimating mutual information (MI) between observable and secret information for detecting ILs, face challenges of the curse of dimensionality, convergence, computational complexity, and MI misestimation. Though effective, emerging supervised machine learning based approaches to detect ILs are limited to binary system sensitive information and lack a comprehensive framework. To address these limitations, we establish a theoretical framework using statistical learning theory and information theory to quantify and detect IL accurately. Using automated machine learning, we demonstrate that MI can be accurately estimated by approximating the typically unknown Bayes predictor's log-loss and accuracy. Based on this, we show how MI can effectively be estimated to detect ILs. Our method performs superior to state-of-the-art baselines in an empirical study considering synthetic and real-world OpenSSL TLS server datasets.
Related papers
- Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data.
Applying MIAs to large language models (LLMs) presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership.
We introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
arXiv Detail & Related papers (2024-10-10T03:31:16Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Weakly Supervised Anomaly Detection via Knowledge-Data Alignment [24.125871437370357]
Anomaly detection plays a pivotal role in numerous web-based applications, including malware detection, anti-money laundering, device failure detection, and network fault analysis.
Weakly Supervised Anomaly Detection (WSAD) has been introduced with a limited number of labeled anomaly samples to enhance model performance.
We introduce a novel framework Knowledge-Data Alignment (KDAlign) to integrate rule knowledge, typically summarized by human experts, to supplement the limited labeled data.
arXiv Detail & Related papers (2024-02-06T07:57:13Z) - X-CBA: Explainability Aided CatBoosted Anomal-E for Intrusion Detection System [2.556190321164248]
Using machine learning (ML) and deep learning (DL) models in Intrusion Detection Systems has led to a trust deficit due to their non-transparent decision-making.
This paper introduces a novel Explainable IDS approach, called X-CBA, that leverages the structural advantages of Graph Neural Networks (GNNs) to effectively process network traffic data.
Our approach achieves high accuracy with 99.47% in threat detection and provides clear, actionable explanations of its analytical outcomes.
arXiv Detail & Related papers (2024-02-01T18:29:16Z) - DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism [3.9377491512285157]
DefectHunter is an innovative model for vulnerability identification that employs the Conformer mechanism.
This mechanism fuses self-attention with convolutional networks to capture both local, position-wise features and global, content-based interactions.
arXiv Detail & Related papers (2023-09-27T00:10:29Z) - Dynamic Model Agnostic Reliability Evaluation of Machine-Learning
Methods Integrated in Instrumentation & Control Systems [1.8978726202765634]
Trustworthiness of datadriven neural network-based machine learning algorithms is not adequately assessed.
In recent reports by the National Institute for Standards and Technology, trustworthiness in ML is a critical barrier to adoption.
We demonstrate a real-time model-agnostic method to evaluate the relative reliability of ML predictions by incorporating out-of-distribution detection on the training dataset.
arXiv Detail & Related papers (2023-08-08T18:25:42Z) - Prompt-driven efficient Open-set Semi-supervised Learning [52.30303262499391]
Open-set semi-supervised learning (OSSL) has attracted growing interest, which investigates a more practical scenario where out-of-distribution (OOD) samples are only contained in unlabeled data.
We propose a prompt-driven efficient OSSL framework, called OpenPrompt, which can propagate class information from labeled to unlabeled data with only a small number of trainable parameters.
arXiv Detail & Related papers (2022-09-28T16:25:08Z) - Robustness Evaluation of Deep Unsupervised Learning Algorithms for
Intrusion Detection Systems [0.0]
This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data.
Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation.
arXiv Detail & Related papers (2022-06-25T02:28:39Z) - Incorporating Causal Graphical Prior Knowledge into Predictive Modeling
via Simple Data Augmentation [92.96204497841032]
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
We propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the conditional independence (CI) relations.
We experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.
arXiv Detail & Related papers (2021-02-27T06:13:59Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Privacy-preserving Traffic Flow Prediction: A Federated Learning
Approach [61.64006416975458]
We propose a privacy-preserving machine learning technique named Federated Learning-based Gated Recurrent Unit neural network algorithm (FedGRU) for traffic flow prediction.
FedGRU differs from current centralized learning methods and updates universal learning models through a secure parameter aggregation mechanism.
It is shown that FedGRU's prediction accuracy is 90.96% higher than the advanced deep learning models.
arXiv Detail & Related papers (2020-03-19T13:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.