Related papers: LLMs for Domain Generation Algorithm Detection

LLMs for Domain Generation Algorithm Detection

URL: http://arxiv.org/abs/2411.03307v1
Date: Tue, 05 Nov 2024 18:01:12 GMT
Title: LLMs for Domain Generation Algorithm Detection
Authors: Reynier Leyva La O, Carlos A. Catania, Tatiana Parlanti,
Abstract summary: This work analyzes the use of large language models (LLMs) for detecting domain generation algorithms (DGAs) We show how In-Context Learning (ICL) and Supervised Fine-Tuning (SFT) can improve detection. In particular, the SFT-based LLM DGA detector outperforms state-of-the-art models using attention layers, achieving 94% accuracy with a 4% false positive rate (FPR)
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This work analyzes the use of large language models (LLMs) for detecting domain generation algorithms (DGAs). We perform a detailed evaluation of two important techniques: In-Context Learning (ICL) and Supervised Fine-Tuning (SFT), showing how they can improve detection. SFT increases performance by using domain-specific data, whereas ICL helps the detection model to quickly adapt to new threats without requiring much retraining. We use Meta's Llama3 8B model, on a custom dataset with 68 malware families and normal domains, covering several hard-to-detect schemes, including recent word-based DGAs. Results proved that LLM-based methods can achieve competitive results in DGA detection. In particular, the SFT-based LLM DGA detector outperforms state-of-the-art models using attention layers, achieving 94% accuracy with a 4% false positive rate (FPR) and excelling at detecting word-based DGA domains.

Related papers

An Efficient Outlier Detection Algorithm for Data Streaming [51.56874851156008]
Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data. We propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm. The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases.
arXiv Detail & Related papers (2025-01-02T05:12:43Z)
Software Fault Localization Based on Multi-objective Feature Fusion and Deep Learning [1.6724380665811045]
Software fault localization remains challenging due to limited feature diversity and low precision in traditional methods. This paper proposes a novel approach that integrates multi-objective optimization with deep learning models to improve both accuracy and efficiency in fault localization (FL)
arXiv Detail & Related papers (2024-11-26T04:37:32Z)
Fine-tuning Large Language Models for DGA and DNS Exfiltration Detection [1.350128573715538]
Large Language Models (LLMs) have demonstrated their proficiency in real-time detection tasks. Our work validates the effectiveness of fine-tuned LLMs for detecting DGAs and DNS exfiltration attacks.
arXiv Detail & Related papers (2024-10-29T04:22:28Z)
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation [13.120801609024147]
retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs. RAG inputs are more complex than most datasets used for training NLI models. We introduce Automatic Generative Domain Adaptation (Auto-GDA) to enable unsupervised domain adaptation.
arXiv Detail & Related papers (2024-10-04T14:21:27Z)
Search for Efficient Large Language Models [52.98684997131108]
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research. Weight pruning, quantization, and distillation have been embraced to compress LLMs, targeting memory reduction and inference acceleration. Most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures.
arXiv Detail & Related papers (2024-09-25T21:32:12Z)
Anomaly Detection of Tabular Data Using LLMs [54.470648484612866]
We show that pre-trained large language models (LLMs) are zero-shot batch-level anomaly detectors. We propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies.
arXiv Detail & Related papers (2024-06-24T04:17:03Z)
DALD: Improving Logits-based Detector without Logits from Black-box LLMs [56.234109491884126]
Large Language Models (LLMs) have revolutionized text generation, producing outputs that closely mimic human writing. We present Distribution-Aligned LLMs Detection (DALD), an innovative framework that redefines the state-of-the-art performance in black-box text detection. DALD is designed to align the surrogate model's distribution with that of unknown target LLMs, ensuring enhanced detection capability and resilience against rapid model iterations.
arXiv Detail & Related papers (2024-06-07T19:38:05Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Generalized Semantic Segmentation by Self-Supervised Source Domain Projection and Multi-Level Contrastive Learning [79.0660895390689]
Deep networks trained on the source domain show degraded performance when tested on unseen target domain data. We propose a Domain Projection and Contrastive Learning (DPCL) approach for generalized semantic segmentation.
arXiv Detail & Related papers (2023-03-03T13:07:14Z)
Detecting Algorithmically Generated Domains Using a GCNN-LSTM Hybrid Neural Network [10.617124610646488]
Domain generation algorithm (DGA) is used by botnets to build a stealthy command and control (C&C) communication channel. AGD detection algorithms provide a lightweight, promising solution in response to the existing DGA techniques. In this paper, a GCNN (gated convolutional neural network)-LSTM (long short-term memory) Hybrid Neural Network (GLHNN) for AGD detection is proposed.
arXiv Detail & Related papers (2022-08-06T05:15:45Z)
Learning Robust Feature Representations for Scene Text Detection [0.0]
We present a network architecture derived from the loss to maximize conditional log-likelihood. By extending the layer of latent variables to multiple layers, the network is able to learn robust features on scale. In experiments, the proposed algorithm significantly outperforms state-of-the-art methods in terms of both recall and precision.
arXiv Detail & Related papers (2020-05-26T01:06:47Z)
Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection [86.69077525494106]
Unsupervised domain adaptation (UDA) has achieved unprecedented success in improving the cross-domain robustness of object detection models. Existing UDA methods largely ignore the instantaneous data distribution during model learning, which could deteriorate the feature representation given large domain shift. We propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains.
arXiv Detail & Related papers (2020-03-19T13:30:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.