Generalizable IoT Traffic Representations for Cross-Network Device Identification
- URL: http://arxiv.org/abs/2601.19315v1
- Date: Tue, 27 Jan 2026 07:56:31 GMT
- Title: Generalizable IoT Traffic Representations for Cross-Network Device Identification
- Authors: Arunan Sivanathan, David Warren, Deepak Mishra, Sushmita Ruj, Natasha Fernandes, Quan Z. Sheng, Minh Tran, Ben Luo, Daniel Coscia, Gustavo Batista, Hassan Habibi Gharakaheili,
- Abstract summary: We study the problem of learning generalizable traffic representations for IoT device identification.<n>We design compact encoder architectures that learn per-flow embeddings from unlabeled IoT traffic.<n>We show that these learned representations can be used effectively for IoT device-type classification.
- Score: 15.867734233278568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models have demonstrated strong performance in classifying network traffic and identifying Internet-of-Things (IoT) devices, enabling operators to discover and manage IoT assets at scale. However, many existing approaches rely on end-to-end supervised pipelines or task-specific fine-tuning, resulting in traffic representations that are tightly coupled to labeled datasets and deployment environments, which can limit generalizability. In this paper, we study the problem of learning generalizable traffic representations for IoT device identification. We design compact encoder architectures that learn per-flow embeddings from unlabeled IoT traffic and evaluate them using a frozen-encoder protocol with a simple supervised classifier. Our specific contributions are threefold. (1) We develop unsupervised encoder--decoder models that learn compact traffic representations from unlabeled IoT network flows and assess their quality through reconstruction-based analysis. (2) We show that these learned representations can be used effectively for IoT device-type classification using simple, lightweight classifiers trained on frozen embeddings. (3) We provide a systematic benchmarking study against the state-of-the-art pretrained traffic encoders, showing that larger models do not necessarily yield more robust representations for IoT traffic. Using more than 18 million real IoT traffic flows collected across multiple years and deployment environments, we learn traffic representations from unlabeled data and evaluate device-type classification on disjoint labeled subsets, achieving macro F1-scores exceeding 0.9 for device-type classification and demonstrating robustness under cross-environment deployment.
Related papers
- Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z) - From Flows to Functions: Macroscopic Behavioral Fingerprinting of IoT Devices via Network Services [2.3037558470292185]
Identifying devices such as cameras, printers, voice assistants, or health monitoring sensors, collectively known as the Internet of Things (IoT), within a network is a critical operational task.<n>Most existing approaches rely on machine learning (ML) techniques applied to fine-grained features of short-lived traffic units (packets and/or flows)<n>We propose a macroscopic, lightweight, and explainable alternative to behavioral fingerprinting focusing on the network services that IoT devices use to perform their intended functions.
arXiv Detail & Related papers (2025-12-18T09:37:50Z) - FlowXpert: Context-Aware Flow Embedding for Enhanced Traffic Detection in IoT Network [7.30584204219718]
In the Internet of Things (IoT) environment, continuous interaction among a large number of devices generates complex and dynamic network traffic.<n>Machine learning (ML)-based traffic detection technology serves as a critical component in ensuring network security.
arXiv Detail & Related papers (2025-09-25T07:52:58Z) - IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification [10.900679661892932]
Traffic classification is crucial for securing Internet of Things (IoT) networks.<n>Deep learning-based methods can autonomously extract latent patterns from massive network traffic.<n>Existing methods rely heavily on either flow-level features or raw packet byte features.<n>This paper proposes IoT-AMLHP, an aligned multimodal learning framework for resource-efficient malicious IoT traffic classification.
arXiv Detail & Related papers (2025-04-21T03:24:14Z) - NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics [72.95483148058378]
We propose to pre-train a general-purpose machine learning model to capture traffic dynamics with only traffic data from NetFlow records.<n>We address challenges such as unifying network feature representations, learning from large unlabeled traffic data volume, and testing on real downstream tasks in DDoS attack detection.
arXiv Detail & Related papers (2024-12-30T00:47:49Z) - MIETT: Multi-Instance Encrypted Traffic Transformer for Encrypted Traffic Classification [59.96233305733875]
Classifying traffic is essential for detecting security threats and optimizing network management.<n>We propose a Multi-Instance Encrypted Traffic Transformer (MIETT) to capture both token-level and packet-level relationships.<n>MIETT achieves results across five datasets, demonstrating its effectiveness in classifying encrypted traffic and understanding complex network behaviors.
arXiv Detail & Related papers (2024-12-19T12:52:53Z) - Towards a Transformer-Based Pre-trained Model for IoT Traffic Classification [0.6060461053918144]
State-of-the-art classification methods are based on Deep Learning.
In real-life situations, where there is a scarce amount of IoT traffic data, the models would not perform so well.
We propose IoT Traffic Classification Transformer (ITCT), which is pre-trained on a large labeled transformer-based IoT traffic dataset.
Experiments demonstrated that ITCT model significantly outperforms existing models, achieving an overall accuracy of 82%.
arXiv Detail & Related papers (2024-07-26T19:13:11Z) - Lens: A Foundation Model for Network Traffic [19.3652490585798]
Lens is a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data.
We design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP)
arXiv Detail & Related papers (2024-02-06T02:45:13Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - A Lightweight, Efficient and Explainable-by-Design Convolutional Neural
Network for Internet Traffic Classification [9.365794791156972]
This paper introduces a new Lightweight, Efficient and eXplainable-by-design convolutional neural network (LEXNet) for Internet traffic classification.
LEXNet relies on a new residual block (for lightweight and efficiency purposes) and prototype layer (for explainability)
Based on a commercial-grade dataset, our evaluation shows that LEXNet succeeds to maintain the same accuracy as the best performing state-of-the-art neural network.
arXiv Detail & Related papers (2022-02-11T10:21:34Z) - FENXI: Deep-learning Traffic Analytics at the Edge [69.34903175081284]
We present FENXI, a system to run complex analytics by leveraging TPU.
FENXI decouples operations and traffic analytics which operates at different granularities.
Our analysis shows that FENXI can sustain forwarding line rate traffic processing requiring only limited resources.
arXiv Detail & Related papers (2021-05-25T08:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.