Learning from Limited Heterogeneous Training Data: Meta-Learning for Unsupervised Zero-Day Web Attack Detection across Web Domains
- URL: http://arxiv.org/abs/2309.03660v1
- Date: Thu, 7 Sep 2023 11:58:20 GMT
- Title: Learning from Limited Heterogeneous Training Data: Meta-Learning for Unsupervised Zero-Day Web Attack Detection across Web Domains
- Authors: Peiyang Li, Ye Wang, Qi Li, Zhuotao Liu, Ke Xu, Ju Ren, Zhiying Liu, Ruilin Lin,
- Abstract summary: We propose RETSINA, a novel meta-learning based framework that enables zero-day Web attack detection across different domains.
We conduct experiments using four real-world datasets on different domains with a total of 293M Web requests.
RETSINA captures on average 126 and 218 zero-day attack requests per day in two domains, respectively, in one month.
- Score: 23.41494712616903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently unsupervised machine learning based systems have been developed to detect zero-day Web attacks, which can effectively enhance existing Web Application Firewalls (WAFs). However, prior arts only consider detecting attacks on specific domains by training particular detection models for the domains. These systems require a large amount of training data, which causes a long period of time for model training and deployment. In this paper, we propose RETSINA, a novel meta-learning based framework that enables zero-day Web attack detection across different domains in an organization with limited training data. Specifically, it utilizes meta-learning to share knowledge across these domains, e.g., the relationship between HTTP requests in heterogeneous domains, to efficiently train detection models. Moreover, we develop an adaptive preprocessing module to facilitate semantic analysis of Web requests across different domains and design a multi-domain representation method to capture semantic correlations between different domains for cross-domain model training. We conduct experiments using four real-world datasets on different domains with a total of 293M Web requests. The experimental results demonstrate that RETSINA outperforms the existing unsupervised Web attack detection methods with limited training data, e.g., RETSINA needs only 5-minute training data to achieve comparable detection performance to the existing methods that train separate models for different domains using 1-day training data. We also conduct real-world deployment in an Internet company. RETSINA captures on average 126 and 218 zero-day attack requests per day in two domains, respectively, in one month.
Related papers
- Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models [3.072340427031969]
Few-shot action recognition (FSAR) aims to learn a model capable of identifying novel actions in videos using only a few examples.
In assuming the base dataset seen during meta-training and novel dataset used for evaluation can come from different domains, cross-domain few-shot learning alleviates data collection and annotation costs.
We systematically evaluate existing state-of-the-art single-domain, transfer-based, and cross-domain FSAR methods on new cross-domain tasks.
arXiv Detail & Related papers (2024-06-03T07:48:18Z) - Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on.
We propose a new approach called D$3$G to learn domain-specific models.
Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z) - DI-NIDS: Domain Invariant Network Intrusion Detection System [9.481792073140204]
In various applications, such as computer vision, domain adaptation techniques have been successful.
In the case of network intrusion detection however, the state-of-the-art domain adaptation approaches have had limited success.
We propose to extract domain invariant features using adversarial domain adaptation from multiple network domains.
arXiv Detail & Related papers (2022-10-15T10:26:22Z) - Cross Domain Few-Shot Learning via Meta Adversarial Training [34.383449283927014]
Few-shot relation classification (RC) is one of the critical problems in machine learning.
We present a novel model that takes into consideration the afore-mentioned cross-domain situation.
A meta-based adversarial training framework is proposed to fine-tune the trained networks for adapting to data from the target domain.
arXiv Detail & Related papers (2022-02-11T15:52:29Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - DAVOS: Semi-Supervised Video Object Segmentation via Adversarial Domain
Adaptation [2.9407987406005263]
Domain shift has always been one of the primary issues in video object segmentation (VOS)
We propose a novel method to tackle domain shift by first introducing adversarial domain adaptation to the VOS task.
Our model achieves state-of-the-art performance on DAVIS2016 with 82.6% mean IoU score after supervised training.
arXiv Detail & Related papers (2021-05-21T08:23:51Z) - Inferring Latent Domains for Unsupervised Deep Domain Adaptation [54.963823285456925]
Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available.
This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets.
We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-03-25T14:33:33Z) - A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks.
In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data.
To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z) - Adaptive Risk Minimization: Learning to Adapt to Domain Shift [109.87561509436016]
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution.
In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts.
We introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains.
arXiv Detail & Related papers (2020-07-06T17:59:30Z) - Unsupervised Domain Adaptation with Multiple Domain Discriminators and
Adaptive Self-Training [22.366638308792734]
Unsupervised Domain Adaptation (UDA) aims at improving the generalization capability of a model trained on a source domain to perform well on a target domain for which no labeled data is available.
We propose an approach to adapt a deep neural network trained on synthetic data to real scenes addressing the domain shift between the two different data distributions.
arXiv Detail & Related papers (2020-04-27T11:48:03Z) - Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog [70.79442700890843]
We propose a novel Dynamic Fusion Network (DF-Net) which automatically exploit the relevance between the target domain and each domain.
With little training data, we show its transferability by outperforming prior best model by 13.9% on average.
arXiv Detail & Related papers (2020-04-23T08:17:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.