Related papers: Towards Characterizing Cyber Networks with Large Language Models

Towards Characterizing Cyber Networks with Large Language Models

URL: http://arxiv.org/abs/2411.07089v1
Date: Mon, 11 Nov 2024 16:09:13 GMT
Title: Towards Characterizing Cyber Networks with Large Language Models
Authors: Alaric Hartsock, Luiz Manella Pereira, Glenn Fink,
Abstract summary: We employ latent features of cyber data to find anomalies via a prototype tool called Cyber Log Embeddings Model (CLEM) CLEM was trained on Zeek network traffic logs from both a real-world production network and an from Internet of Things (IoT) cybersecurity testbed.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Threat hunting analyzes large, noisy, high-dimensional data to find sparse adversarial behavior. We believe adversarial activities, however they are disguised, are extremely difficult to completely obscure in high dimensional space. In this paper, we employ these latent features of cyber data to find anomalies via a prototype tool called Cyber Log Embeddings Model (CLEM). CLEM was trained on Zeek network traffic logs from both a real-world production network and an from Internet of Things (IoT) cybersecurity testbed. The model is deliberately overtrained on a sliding window of data to characterize each window closely. We use the Adjusted Rand Index (ARI) to comparing the k-means clustering of CLEM output to expert labeling of the embeddings. Our approach demonstrates that there is promise in using natural language modeling to understand cyber data.

Related papers

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data [61.62554324594797]
We propose DreamMask, which explores how to generate training data in the open-vocabulary setting, and how to train the model with both real and synthetic data. In general, DreamMask significantly simplifies the collection of large-scale training data, serving as a plug-and-play enhancement for existing methods. For instance, when trained on COCO and tested on ADE20K, the model equipped with DreamMask outperforms the previous state-of-the-art by a substantial margin of 2.1% mIoU.
arXiv Detail & Related papers (2025-01-03T19:00:00Z)
Explainable AI for Comparative Analysis of Intrusion Detection Models [20.683181384051395]
This research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic. We trained all models to the accuracy of 90% on the UNSW-NB15 dataset. We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness.
arXiv Detail & Related papers (2024-06-14T03:11:01Z)
Twin Auto-Encoder Model for Learning Separable Representation in Cyberattack Detection [21.581155557707632]
We propose a novel mod called Twin Auto-Encoder (TAE) for cyberattack detection. Experiment results show the superior accuracy of TAE over state-of-the-art RL models and well-known machine learning algorithms.
arXiv Detail & Related papers (2024-03-22T03:39:40Z)
Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation [65.268245109828]
We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations. These descriptions can be used to generate synthetic data using generative models, such as diffusion models. Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
arXiv Detail & Related papers (2023-12-09T04:43:49Z)
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric. We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z)
From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying [10.919336198760808]
We introduce a novel methodology to detect leaked data that are used to train classification models. textscLDSS involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset. This enables the effective identification of models trained on leaked data through model querying alone.
arXiv Detail & Related papers (2023-10-06T10:36:28Z)
A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on Social Media Using Synthetic Data [2.378735224874938]
This paper proposes a trustable LSTM-Autoencoder Network for cyberbullying detection on social media. We have demonstrated a cutting-edge method to address data availability difficulties by producing machine-translated data. We carried out experimental identification of aggressive comments on Hindi, Bangla, and English datasets.
arXiv Detail & Related papers (2023-08-15T17:20:05Z)
Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications. We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data. Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z)
Zero Day Threat Detection Using Metric Learning Autoencoders [3.1965908200266173]
The proliferation of zero-day threats (ZDTs) to companies' networks has been immensely costly. Deep learning methods are an attractive option for their ability to capture highly-nonlinear behavior patterns. The models presented here are also trained and evaluated with two more datasets, and continue to show promising results even when generalizing to new network topologies.
arXiv Detail & Related papers (2022-11-01T13:12:20Z)
Detection of Malicious Websites Using Machine Learning Techniques [0.0]
K-Nearest Neighbor is the only model that performs consistently high across datasets. Other models such as Random Forest, Decision Trees, Logistic Regression, and Support Vector Machines also consistently outperform a baseline model of predicting every link as malicious.
arXiv Detail & Related papers (2022-09-13T13:48:31Z)
CARLA-GeAR: a Dataset Generator for a Systematic Evaluation of Adversarial Robustness of Vision Models [61.68061613161187]
This paper presents CARLA-GeAR, a tool for the automatic generation of synthetic datasets for evaluating the robustness of neural models against physical adversarial patches. The tool is built on the CARLA simulator, using its Python API, and allows the generation of datasets for several vision tasks in the context of autonomous driving. The paper presents an experimental study to evaluate the performance of some defense methods against such attacks, showing how the datasets generated with CARLA-GeAR might be used in future work as a benchmark for adversarial defense in the real world.
arXiv Detail & Related papers (2022-06-09T09:17:38Z)
Learning to Detect: A Data-driven Approach for Network Intrusion Detection [17.288512506016612]
We perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks. Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy. We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks.
arXiv Detail & Related papers (2021-08-18T21:19:26Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.