MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification
- URL: http://arxiv.org/abs/2411.13004v1
- Date: Wed, 20 Nov 2024 03:01:41 GMT
- Title: MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification
- Authors: Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Honggang Zhang,
- Abstract summary: We present a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification.
Experiments on 10 datasets show superior or competitive performance over state-of-the-art models.
- Score: 19.476061046309052
- License:
- Abstract: We present MERLOT, a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification. By applying model distillation techniques in a teacher-student paradigm, compact models derived from GPT-2-base retain high classification accuracy while minimizing computational costs. These models function as specialized experts in an MoE architecture, dynamically assigned via a gating network. Unlike generation-based methods, our approach directly classifies encrypted traffic using the final decoder token with contextual feature embedding as input. Experiments on 10 datasets show superior or competitive performance over the state-of-the-art models while significantly reducing resource demands, underscoring its effectiveness and robustness.
Related papers
- Self-DenseMobileNet: A Robust Framework for Lung Nodule Classification using Self-ONN and Stacking-based Meta-Classifier [1.2300841481611335]
Self-DenseMobileNet is designed to enhance the classification of nodules and non-nodules in chest radiographs (CXRs)
Our framework integrates advanced image standardization and enhancement techniques to optimize the input quality.
When tested on an external dataset, the framework maintained strong generalizability with an accuracy of 89.40%.
arXiv Detail & Related papers (2024-10-16T14:04:06Z) - Is Tokenization Needed for Masked Particle Modelling? [8.79008927474707]
Masked particle modeling (MPM) is a self-supervised learning scheme for constructing expressive representations of unordered sets.
We improve MPM by addressing inefficiencies in the implementation and incorporating a more powerful decoder.
We show that these new methods outperform the tokenized learning objective from the original MPM on a new test bed for foundation models for jets.
arXiv Detail & Related papers (2024-09-19T09:12:29Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z) - Self-Feature Regularization: Self-Feature Distillation Without Teacher
Models [0.0]
Self-Feature Regularization(SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers.
We firstly use generalization-l2 loss to match local features and a many-to-one approach to distill more intensively in the channel dimension.
arXiv Detail & Related papers (2021-03-12T15:29:00Z) - MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down
Distillation [153.56211546576978]
In this work, we propose that better soft targets with higher compatibil-ity can be generated by using a label generator.
We can employ the meta-learning technique to optimize this label generator.
The experiments are conducted on two standard classificationbenchmarks, namely CIFAR-100 and ILSVRC2012.
arXiv Detail & Related papers (2020-08-27T13:04:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.