ReGAIN: Retrieval-Grounded AI Framework for Network Traffic Analysis
- URL: http://arxiv.org/abs/2512.22223v1
- Date: Tue, 23 Dec 2025 00:16:14 GMT
- Title: ReGAIN: Retrieval-Grounded AI Framework for Network Traffic Analysis
- Authors: Shaghayegh Shajarian, Kennedy Marsh, James Benson, Sajad Khorsandroo, Mahmoud Abdelsalam,
- Abstract summary: ReGAIN is a framework that combines traffic summarization, retrieval-augmented generation (RAG), and Large Language Model (LLM) reasoning for transparent and accurate network traffic analysis.<n> evaluated on ICMP ping flood and TCP SYN flood traces from the real-world traffic dataset.
- Score: 5.887997322139195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern networks generate vast, heterogeneous traffic that must be continuously analyzed for security and performance. Traditional network traffic analysis systems, whether rule-based or machine learning-driven, often suffer from high false positives and lack interpretability, limiting analyst trust. In this paper, we present ReGAIN, a multi-stage framework that combines traffic summarization, retrieval-augmented generation (RAG), and Large Language Model (LLM) reasoning for transparent and accurate network traffic analysis. ReGAIN creates natural-language summaries from network traffic, embeds them into a multi-collection vector database, and utilizes a hierarchical retrieval pipeline to ground LLM responses with evidence citations. The pipeline features metadata-based filtering, MMR sampling, a two-stage cross-encoder reranking mechanism, and an abstention mechanism to reduce hallucinations and ensure grounded reasoning. Evaluated on ICMP ping flood and TCP SYN flood traces from the real-world traffic dataset, it demonstrates robust performance, achieving accuracy between 95.95% and 98.82% across different attack types and evaluation benchmarks. These results are validated against two complementary sources: dataset ground truth and human expert assessments. ReGAIN also outperforms rule-based, classical ML, and deep learning baselines while providing unique explainability through trustworthy, verifiable responses.
Related papers
- TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z) - ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling [57.91760520589592]
Scaling network depth has been a central driver behind the success of modern foundation models.<n>This paper revisits the default mechanism for deepening neural networks, namely residual connections.<n>We introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data.
arXiv Detail & Related papers (2026-02-09T18:54:18Z) - Comparative Evaluation of VAE, GAN, and SMOTE for Tor Detection in Encrypted Network Traffic [0.0]
Encrypted network traffic poses significant challenges for intrusion detection.<n>Traditional data augmentation methods struggle to preserve the complex temporal and statistical characteristics of real network traffic.<n>This work explores the use of Generative AI (GAI) models to synthesize realistic and diverse encrypted traffic traces.
arXiv Detail & Related papers (2026-01-03T13:31:53Z) - An LLM-Powered AI Agent Framework for Holistic IoT Traffic Interpretation [1.3832473221638348]
This work presents an LLM-powered AI agent framework that converts raw packet captures into structured and semantically enriched representations for interactive analysis.<n>An AI agent guided by a large language model performs reasoning over the indexed traffic artifacts, assembling evidence to produce accurate and human-readable interpretations.
arXiv Detail & Related papers (2025-10-15T12:30:01Z) - FlowXpert: Context-Aware Flow Embedding for Enhanced Traffic Detection in IoT Network [7.30584204219718]
In the Internet of Things (IoT) environment, continuous interaction among a large number of devices generates complex and dynamic network traffic.<n>Machine learning (ML)-based traffic detection technology serves as a critical component in ensuring network security.
arXiv Detail & Related papers (2025-09-25T07:52:58Z) - MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection [47.86433139298671]
This paper introduces MAWIFlow, a flow-based benchmark derived from the MAWILAB v1.1 dataset.<n>The resulting datasets comprise temporally distinct samples from January 2011, 2016, and 2021, drawn from trans-Pacific backbone traffic.<n>Traditional machine learning methods, including Decision Trees, Random Forests, XGBoost, and Logistic Regression, are compared to a deep learning model based on a CNN-BiLSTM architecture.
arXiv Detail & Related papers (2025-06-20T14:51:35Z) - Diffusion Models Meet Network Management: Improving Traffic Matrix Analysis with Diffusion-based Approach [12.549916064729313]
This paper proposes a diffusion-based traffic matrix analysis framework named Diffusion-TM.<n>We show that our framework can obtain promising results even with $5%$ known values left in datasets.
arXiv Detail & Related papers (2024-11-29T06:20:34Z) - RACH Traffic Prediction in Massive Machine Type Communications [5.416701003120508]
This paper presents a machine learning-based framework tailored for forecasting bursty traffic in ALOHA networks.<n>We develop a new low-complexity online prediction algorithm that updates the states of the LSTM network by leveraging frequently collected data from the mMTC network.<n>We evaluate the performance of the proposed framework in a network with a single base station and thousands of devices organized into groups with distinct traffic-generating characteristics.
arXiv Detail & Related papers (2024-05-08T17:28:07Z) - Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting [57.487936697747024]
A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix.
We introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure.
arXiv Detail & Related papers (2024-02-28T20:24:56Z) - From Environmental Sound Representation to Robustness of 2D CNN Models
Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.