HTTP2vec: Embedding of HTTP Requests for Detection of Anomalous Traffic
- URL: http://arxiv.org/abs/2108.01763v1
- Date: Tue, 3 Aug 2021 21:53:31 GMT
- Title: HTTP2vec: Embedding of HTTP Requests for Detection of Anomalous Traffic
- Authors: Mateusz Gniewkowski, Henryk Maciejewski, Tomasz R. Surmacz, Wiktor
Walentynowicz
- Abstract summary: We propose an unsupervised language representation model for embedding HTTP requests and then using it to classify anomalies in the traffic.
The solution is motivated by methods used in Natural Language Processing (NLP) such as Doc2Vec.
To verify how the solution would work in real word conditions, we train the model using only legitimate traffic.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hypertext transfer protocol (HTTP) is one of the most widely used protocols
on the Internet. As a consequence, most attacks (i.e., SQL injection, XSS) use
HTTP as the transport mechanism. Therefore, it is crucial to develop an
intelligent solution that would allow to effectively detect and filter out
anomalies in HTTP traffic. Currently, most of the anomaly detection systems are
either rule-based or trained using manually selected features. We propose
utilizing modern unsupervised language representation model for embedding HTTP
requests and then using it to classify anomalies in the traffic. The solution
is motivated by methods used in Natural Language Processing (NLP) such as
Doc2Vec which could potentially capture the true understanding of HTTP
messages, and therefore improve the efficiency of Intrusion Detection System.
In our work, we not only aim at generating a suitable embedding space, but also
at the interpretability of the proposed model. We decided to use the current
state-of-the-art RoBERTa, which, as far as we know, has never been used in a
similar problem. To verify how the solution would work in real word conditions,
we train the model using only legitimate traffic. We also try to explain the
results based on clusters that occur in the vectorized requests space and a
simple logistic regression classifier. We compared our approach with the
similar, previously proposed methods. We evaluate the feasibility of our method
on three different datasets: CSIC2010, CSE-CIC-IDS2018 and one that we prepared
ourselves. The results we show are comparable to others or better, and most
importantly - interpretable.
Related papers
- Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - Detecting unknown HTTP-based malicious communication behavior via generated adversarial flows and hierarchical traffic features [6.418271335117575]
Experienced adversaries often hide malicious information in HTTP traffic to evade detection.
We propose an HTTP-based Malicious Communication traffic Detection Model based on generated adversarial flows and hierarchical traffic features.
arXiv Detail & Related papers (2023-09-07T14:28:31Z) - Classification and Explanation of Distributed Denial-of-Service (DDoS)
Attack Detection using Machine Learning and Shapley Additive Explanation
(SHAP) Methods [4.899818550820576]
Distinguishing between legitimate traffic and malicious traffic is a challenging task.
An inter-model explanation implemented to classify a traffic flow whether is benign or malicious is an important investigation of the inner working theory of the model.
We propose a framework that can not only classify legitimate traffic and malicious traffic of DDoS attacks but also use SHAP to explain the decision-making of the model.
arXiv Detail & Related papers (2023-06-27T04:51:29Z) - Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning
in Encrypted Traffic Classification [68.19713459228369]
We compare transfer learning, meta-learning and contrastive learning against reference Machine Learning (ML) tree-based and monolithic DL models.
We show that (i) using large datasets we can obtain more general representations, (ii) contrastive learning is the best methodology.
While ML tree-based cannot handle large tasks but fits well small tasks, by means of reusing learned representations, DL methods are reaching tree-based models performance also for small tasks.
arXiv Detail & Related papers (2023-05-21T11:20:49Z) - Instance Attack:An Explanation-based Vulnerability Analysis Framework
Against DNNs for Malware Detection [0.0]
We propose the notion of the instance-based attack.
Our scheme is interpretable and can work in a black-box environment.
Our method operates in black-box settings and the results can be validated with domain knowledge.
arXiv Detail & Related papers (2022-09-06T12:41:20Z) - Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection.
Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior.
Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - A2Log: Attentive Augmented Log Anomaly Detection [53.06341151551106]
Anomaly detection becomes increasingly important for the dependability and serviceability of IT services.
Existing unsupervised methods need anomaly examples to obtain a suitable decision boundary.
We develop A2Log, which is an unsupervised anomaly detection method consisting of two steps: Anomaly scoring and anomaly decision.
arXiv Detail & Related papers (2021-09-20T13:40:21Z) - The Devil Is in the Details: An Efficient Convolutional Neural Network
for Transport Mode Detection [3.008051369744002]
Transport mode detection is a classification problem aiming to design an algorithm that can infer the transport mode of a user given multimodal signals.
We show that a small, optimized model can perform as well as a current deep model.
arXiv Detail & Related papers (2021-09-16T08:05:47Z) - DoS and DDoS Mitigation Using Variational Autoencoders [15.23225419183423]
We explore the potential of Variational Autoencoders to serve as a component within an intelligent security solution.
Two methods based on the ability of Variational Autoencoders to learn latent representations from network traffic flows are proposed.
arXiv Detail & Related papers (2021-05-14T15:38:40Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.