Related papers: netFound: Foundation Model for Network Security

netFound: Foundation Model for Network Security

URL: http://arxiv.org/abs/2310.17025v4
Date: Wed, 29 Jan 2025 17:41:14 GMT
Title: netFound: Foundation Model for Network Security
Authors: Satyandra Guthula, Roman Beltiukov, Navya Battula, Wenbo Guo, Arpit Gupta, Inder Monga,
Abstract summary: This paper introduces a novel transformer-based network foundation model, netFound.<n>We employ self-supervised learning techniques on abundant, unlabeled network telemetry data for pre-training.<n>Our results demonstrate that netFound effectively captures the hidden networking context in production settings.
Score: 10.84029318509573
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Developing generalizable ML-based solutions for disparate learning problems in network security is highly desired. However, despite a rich history of applying ML to network security, most existing solutions lack generalizability. This lack of progress can be attributed to an overreliance on supervised learning techniques and the associated challenges of curating well-specified labeled training data. This paper addresses a fundamental gap by introducing a novel transformer-based network foundation model, netFound. We employ self-supervised learning techniques on abundant, unlabeled network telemetry data for pre-training. This pretrained model can subsequently be fine-tuned to create generalizable learning artifacts for disparate learning tasks, even when using commonly available but challenging labeled datasets that are sparse, noisy, and skewed. To realize this goal, netFound leverages various domain-specific attributes and constraints unique to network data (packet traces) by developing multi-modal embeddings, protocol-aware tokenization, data-driven token composition, and hierarchical transformers. Our results demonstrate that netFound's domain-specific design choices ensure that it (1) effectively captures the hidden networking context in production settings, (2) outperforms four different SOTA methods on five different learning tasks, and (3) is robust to both noisy labels and learning shortcuts -- critical for developing generalizable ML models in practical settings.

Related papers

These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining [10.749875317643031]
Transfer learning is a cornerstone of modern machine learning, promising a way to adapt models pretrained on a broad mix of data to new tasks with minimal new data.<n>We evaluate model transfer from a pretraining mixture to each of its component tasks, assessing whether pretrained features can match the performance of task-specific direct training.<n>We identify a fundamental limitation in deep learning models, where networks fail to learn new features once they encode similar competing features during training.
arXiv Detail & Related papers (2025-06-23T01:04:29Z)
tn4ml: Tensor Network Training and Customization for Machine Learning [0.8799686507544172]
tn4ml is a novel library designed to seamlessly integrate Networks into Machine Learning tasks. Inspired by existing Machine Learning frameworks, the library offers a user-friendly structure with modules for data embedding, objective function definition, and model training.
arXiv Detail & Related papers (2025-02-18T17:57:29Z)
Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks. We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
A Survey of Machine Learning-based Physical-Layer Authentication in Wireless Communications [17.707450193500698]
Physical-Layer Authentication (PLA) is emerging as a promising complement due to its exploitation of unique properties in wireless environments. This paper presents a comprehensive survey of characteristics and technologies that can be used in the ML-based PLA.
arXiv Detail & Related papers (2024-11-15T03:01:23Z)
Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods. MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections. Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
Mitigating Catastrophic Forgetting in Long Short-Term Memory Networks [7.291687946822538]
Continual learning on sequential data is critical for many machine learning (ML) deployments. LSTM networks suffer from catastrophic forgetting and are limited in their ability to learn multiple tasks continually. We discover that catastrophic forgetting in LSTM networks can be overcome in two novel and readily-implementable ways.
arXiv Detail & Related papers (2023-05-26T20:17:18Z)
Federated Learning and Meta Learning: Approaches, Applications, and Directions [94.68423258028285]
In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta) Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks.
arXiv Detail & Related papers (2022-10-24T10:59:29Z)
Transfer Learning with Pre-trained Conditional Generative Models [29.43740987925133]
We propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training and pseudo semi-supervised learning. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.
arXiv Detail & Related papers (2022-04-27T10:36:32Z)
Training Deep Networks from Zero to Hero: avoiding pitfalls and going beyond [59.94347858883343]
This tutorial covers the basic steps as well as more recent options to improve models. It can be particularly useful in datasets that are not as well-prepared as those in challenges.
arXiv Detail & Related papers (2021-09-06T21:31:42Z)
Federated Learning: A Signal Processing Perspective [144.63726413692876]
Federated learning is an emerging machine learning paradigm for training models across multiple edge devices holding local datasets, without explicitly exchanging the data. This article provides a unified systematic framework for federated learning in a manner that encapsulates and highlights the main challenges that are natural to treat using signal processing tools.
arXiv Detail & Related papers (2021-03-31T15:14:39Z)
Applying Graph-based Deep Learning To Realistic Network Scenarios [5.453745629140304]
This paper presents a new Graph-based deep learning model able to estimate accurately the per-path mean delay in networks. The proposed model can generalize successfully over topologies, routing configurations, queue scheduling policies and traffic matrices unseen during the training phase.
arXiv Detail & Related papers (2020-10-13T20:58:59Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.