Related papers: Command-line Obfuscation Detection using Small Language Models

Command-line Obfuscation Detection using Small Language Models

URL: http://arxiv.org/abs/2408.02637v1
Date: Mon, 5 Aug 2024 17:01:33 GMT
Title: Command-line Obfuscation Detection using Small Language Models
Authors: Vojtech Outrata, Michael Adam Polak, Martin Kopp,
Abstract summary: adversaries often use command-line obfuscation to avoid detection. We have implemented a scalable NLP-based detection method that leverages a custom-trained, small transformer language model. We show the model's superiority to signatures on established malware and showcase previously unseen obfuscated samples detected by our model.
Score: 0.7373617024876725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To avoid detection, adversaries often use command-line obfuscation. There are numerous techniques of the command-line obfuscation, all designed to alter the command-line syntax without affecting its original functionality. This variability forces most security solutions to create an exhaustive enumeration of signatures for even a single pattern. In contrast to using signatures, we have implemented a scalable NLP-based detection method that leverages a custom-trained, small transformer language model that can be applied to any source of execution logs. The evaluation on top of real-world telemetry demonstrates that our approach yields high-precision detections even on high-volume telemetry from a diverse set of environments spanning from universities and businesses to healthcare or finance. The practical value is demonstrated in a case study of real-world samples detected by our model. We show the model's superiority to signatures on established malware known to employ obfuscation and showcase previously unseen obfuscated samples detected by our model.

Related papers

Detecting Hard-Coded Credentials in Software Repositories via LLMs [0.0]
Software developers frequently hard-code credentials such as passwords, generic secrets, private keys, and generic tokens in software repositories.<n>These credentials create attack surfaces exploitable by a potential adversary to conduct malicious exploits such as backdoor attacks.<n>Recent detection efforts utilize embedding models to vectorize textual credentials before passing them to classifiers for predictions.<n>Our model outperforms the current state-of-the-art by 13% in F1 measure on the benchmark dataset.
arXiv Detail & Related papers (2025-06-16T04:33:48Z)
Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks [50.53590930588431]
adversarial examples pose serious threats to natural language processing systems. Recent studies suggest that adversarial texts deviate from the underlying manifold of normal texts, whereas masked language models can approximate the manifold of normal data. We first introduce Masked Language Model-based Detection (MLMD), leveraging mask unmask operations of the masked language modeling (MLM) objective.
arXiv Detail & Related papers (2025-04-08T14:10:57Z)
Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness [6.229124658686219]
We develop a generic dual-channel detection paradigm that uses token cohesiveness as a plug-and-play module to improve existing zero-shot detectors. To calculate token cohesiveness, we use a few rounds of random token deletion and semantic difference measurement. Experiments with four state-of-the-art base detectors on various datasets, source models, and evaluation settings demonstrate the effectiveness and generality of the proposed approach.
arXiv Detail & Related papers (2024-09-25T13:18:57Z)
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection. We design a forgery-style mixture formulation that augments the diversity of forgery source domains. We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
A Lean Transformer Model for Dynamic Malware Analysis and Detection [0.0]
Malware is a fast-growing threat to the modern computing world and existing lines of defense are not efficient enough to address this issue. Previous works have shown some success leveraging Neural Networks and API calls sequences extracted from execution reports. In this paper, we design an emulation-Only model, based on the Transformers architecture, to detect malicious files.
arXiv Detail & Related papers (2024-08-05T08:46:46Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
Towards a Practical Defense against Adversarial Attacks on Deep Learning-based Malware Detectors via Randomized Smoothing [3.736916304884177]
We propose a practical defense against adversarial malware examples inspired by randomized smoothing. In our work, instead of employing Gaussian or Laplace noise when randomizing inputs, we propose a randomized ablation-based smoothing scheme. We have empirically evaluated the proposed ablation-based model against various state-of-the-art evasion attacks on the BODMAS dataset.
arXiv Detail & Related papers (2023-08-17T10:30:25Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Securing Deep Generative Models with Universal Adversarial Signature [69.51685424016055]
Deep generative models pose threats to society due to their potential misuse. In this paper, we propose to inject a universal adversarial signature into an arbitrary pre-trained generative model. The proposed method is validated on the FFHQ and ImageNet datasets with various state-of-the-art generative models.
arXiv Detail & Related papers (2023-05-25T17:59:01Z)
Masked Language Model Based Textual Adversarial Example Detection [14.734863175424797]
Adrial attacks are a serious threat to reliable deployment of machine learning models in safety-critical applications. We propose a novel textual adversarial example detection method, namely Masked Model-based Detection (MLMD)
arXiv Detail & Related papers (2023-04-18T06:52:14Z)
Hard Nominal Example-aware Template Mutual Matching for Industrial Anomaly Detection [74.9262846410559]
textbfHard Nominal textbfExample-aware textbfTemplate textbfMutual textbfMatching (HETMM) textitHETMM aims to construct a robust prototype-based decision boundary, which can precisely distinguish between hard-nominal examples and anomalies.
arXiv Detail & Related papers (2023-03-28T17:54:56Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.