Multi-Task Hierarchical Learning Based Network Traffic Analytics
- URL: http://arxiv.org/abs/2106.03850v1
- Date: Sat, 5 Jun 2021 02:25:59 GMT
- Title: Multi-Task Hierarchical Learning Based Network Traffic Analytics
- Authors: Onur Barut, Yan Luo, Tong Zhang, Weigang Li, Peilong Li
- Abstract summary: We present three open datasets containing nearly 1.3M labeled flows in total.
We focus on broad aspects in network traffic analysis, including both malware detection and application classification.
As we continue to grow them, we expect the datasets to serve as a common ground for AI driven, reproducible research on network flow analytics.
- Score: 18.04195092141071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifying network traffic is the basis for important network applications.
Prior research in this area has faced challenges on the availability of
representative datasets, and many of the results cannot be readily reproduced.
Such a problem is exacerbated by emerging data-driven machine learning based
approaches. To address this issue, we present(N et)2databasewith three open
datasets containing nearly 1.3M labeled flows in total, with a comprehensive
list of flow features, for there search community1. We focus on broad aspects
in network traffic analysis, including both malware detection and application
classification. As we continue to grow them, we expect the datasets to serve as
a common ground for AI driven, reproducible research on network flow analytics.
We release the datasets publicly and also introduce a Multi-Task Hierarchical
Learning (MTHL)model to perform all tasks in a single model. Our results show
that MTHL is capable of accurately performing multiple tasks with hierarchical
labeling with a dramatic reduction in training time.
Related papers
- Task-Augmented Cross-View Imputation Network for Partial Multi-View Incomplete Multi-Label Classification [25.764838008710615]
We present a task-augmented cross-view imputation network (TACVI-Net) for handling partial multi-view incomplete multi-label classification.
In the first stage, we leverage the information bottleneck theory to obtain a discriminative representation of each view.
In the second stage, an autoencoder based multi-view reconstruction network is utilized to extract high-level semantic representation.
arXiv Detail & Related papers (2024-09-12T10:56:11Z) - On Inter-dataset Code Duplication and Data Leakage in Large Language Models [4.148857672591562]
This paper explores the phenomenon of inter-dataset code duplication and its impact on evaluating large language models (LLMs)
Our findings reveal a potential threat to the evaluation of LLMs across multiple SE tasks, stemming from the inter-dataset code duplication phenomenon.
We provide evidence that open-source models could be affected by inter-dataset duplication.
arXiv Detail & Related papers (2024-01-15T19:46:40Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Factorized Contrastive Learning: Going Beyond Multi-view Redundancy [116.25342513407173]
This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy.
On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-06-08T15:17:04Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Data Augmentation for Abstractive Query-Focused Multi-Document
Summarization [129.96147867496205]
We present two QMDS training datasets, which we construct using two data augmentation methods.
These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries.
We build end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets.
arXiv Detail & Related papers (2021-03-02T16:57:01Z) - Graph-Based Neural Network Models with Multiple Self-Supervised
Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points.
We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z) - Multi-Task Learning with Deep Neural Networks: A Survey [0.0]
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are simultaneously learned by a shared model.
We give an overview of multi-task learning methods for deep neural networks, with the aim of summarizing both the well-established and most recent directions within the field.
arXiv Detail & Related papers (2020-09-10T19:31:04Z) - NetML: A Challenge for Network Traffic Analytics [16.8001000840057]
We release three open datasets containing almost 1.3M labeled flows in total.
We focus on broad aspects in network traffic analysis, including both malware detection and application classification.
As we continue to grow NetML, we expect the datasets to serve as a common platform for AI driven, reproducible research on network flow analytics.
arXiv Detail & Related papers (2020-04-25T01:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.