End-to-End triplet loss based fine-tuning for network embedding in effective PII detection
- URL: http://arxiv.org/abs/2502.09002v1
- Date: Thu, 13 Feb 2025 06:43:46 GMT
- Title: End-to-End triplet loss based fine-tuning for network embedding in effective PII detection
- Authors: Rishika Kohli, Shaifu Gupta, Manoj Singh Gaur,
- Abstract summary: We propose a novel deep learning based end-to-end learning framework for prediction of exposure of PII in mobile packets.
The framework employs a pre-trained large language model (LLM) and an autoencoder to generate embedding of network packets.
We compare our proposed detection framework with other state-of-the-art works in detecting PII leaks from user's device.
- Score: 0.12289361708127873
- License:
- Abstract: There are many approaches in mobile data ecosystem that inspect network traffic generated by applications running on user's device to detect personal data exfiltration from the user's device. State-of-the-art methods rely on features extracted from HTTP requests and in this context, machine learning involves training classifiers on these features and making predictions using labelled packet traces. However, most of these methods include external feature selection before model training. Deep learning, on the other hand, typically does not require such techniques, as it can autonomously learn and identify patterns in the data without external feature extraction or selection algorithms. In this article, we propose a novel deep learning based end-to-end learning framework for prediction of exposure of personally identifiable information (PII) in mobile packets. The framework employs a pre-trained large language model (LLM) and an autoencoder to generate embedding of network packets and then uses a triplet-loss based fine-tuning method to train the model, increasing detection effectiveness using two real-world datasets. We compare our proposed detection framework with other state-of-the-art works in detecting PII leaks from user's device.
Related papers
- Effective Intrusion Detection for UAV Communications using Autoencoder-based Feature Extraction and Machine Learning Approach [2.3845721581271206]
We propose an autoencoder-based machine learning intrusion detection method for UAVs using actual dataset.
Our experiment results show that the proposed method outperforms the baselines in both binary and multi-class classification tasks.
arXiv Detail & Related papers (2024-10-01T08:44:23Z) - Preliminary study on artificial intelligence methods for cybersecurity threat detection in computer networks based on raw data packets [34.82692226532414]
In this paper, we investigate deep learning methodologies capable of detecting attacks in real-time directly from raw packet data within network traffic.
We propose a novel approach where packets are stacked into windows and separately recognised, with a 2D image representation suitable for processing with computer vision models.
arXiv Detail & Related papers (2024-07-24T15:04:00Z) - IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection [3.3772986620114387]
We present an approach for modelling IoT network attacks that focuses on generalizability, yet also leads to better detection and performance.
First, we present an improved rolling window approach for feature extraction, and introduce a multi-step feature selection process that reduces overfitting.
Second, we build and test models using isolated train and test datasets, thereby avoiding common data leaks.
Third, we rigorously evaluate our methodology using a diverse portfolio of machine learning models, evaluation metrics and datasets.
arXiv Detail & Related papers (2023-10-17T21:46:43Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Panning for gold: Lessons learned from the platform-agnostic automated
detection of political content in textual data [48.7576911714538]
We discuss how these techniques can be used to detect political content across different platforms.
We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks.
Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
arXiv Detail & Related papers (2022-07-01T15:23:23Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - SOME/IP Intrusion Detection using Deep Learning-based Sequential Models
in Automotive Ethernet Networks [2.3204135551124407]
Intrusion Detection Systems are widely used to detect cyberattacks.
We present a deep learning-based sequential model for offline intrusion detection on SOME/IP protocol.
arXiv Detail & Related papers (2021-08-04T09:58:06Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Self-Supervised Person Detection in 2D Range Data using a Calibrated
Camera [83.31666463259849]
We propose a method to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors.
We show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained using manual annotations.
Our method is an effective way to improve person detectors during deployment without any additional labeling effort.
arXiv Detail & Related papers (2020-12-16T12:10:04Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.