SIGL: Securing Software Installations Through Deep Graph Learning
- URL: http://arxiv.org/abs/2008.11533v2
- Date: Tue, 22 Jun 2021 23:29:44 GMT
- Title: SIGL: Securing Software Installations Through Deep Graph Learning
- Authors: Xueyuan Han, Xiao Yu, Thomas Pasquier, Ding Li, Junghwan Rhee, James
Mickens, Margo Seltzer, Haifeng Chen
- Abstract summary: Recent supply-chain attacks demonstrate that application integrity must be ensured during installation itself.
We introduce SIGL, a new tool for detecting malicious behavior during software installation.
We demonstrate that SIGL has a detection accuracy of 96%, outperforming similar systems from industry and academia.
- Score: 25.178164770390712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many users implicitly assume that software can only be exploited after it is
installed. However, recent supply-chain attacks demonstrate that application
integrity must be ensured during installation itself. We introduce SIGL, a new
tool for detecting malicious behavior during software installation. SIGL
collects traces of system call activity, building a data provenance graph that
it analyzes using a novel autoencoder architecture with a graph long short-term
memory network (graph LSTM) for the encoder and a standard multilayer
perceptron for the decoder. SIGL flags suspicious installations as well as the
specific installation-time processes that are likely to be malicious. Using a
test corpus of 625 malicious installers containing real-world malware, we
demonstrate that SIGL has a detection accuracy of 96%, outperforming similar
systems from industry and academia by up to 87% in precision and recall and 45%
in accuracy. We also demonstrate that SIGL can pinpoint the processes most
likely to have triggered malicious behavior, works on different audit platforms
and operating systems, and is robust to training data contamination and
adversarial attack. It can be used with application-specific models, even in
the presence of new software versions, as well as application-agnostic
meta-models that encompass a wide range of applications and installers.
Related papers
- Trace Gadgets: Minimizing Code Context for Machine Learning-Based Vulnerability Prediction [8.056137513320065]
This work introduces Trace Gadgets, a novel code representation that minimizes code context by removing non-related code.
As input for ML models, Trace Gadgets provide a minimal but complete context, thereby improving the detection performance.
Our results show that state-of-the-art machine learning models perform best when using Trace Gadgets compared to previous code representations.
arXiv Detail & Related papers (2025-04-18T13:13:39Z) - Enhanced LLM-Based Framework for Predicting Null Pointer Dereference in Source Code [2.2020053359163305]
We propose a novel approach using a fine-tuned Large Language Model (LLM) termed "DeLLNeuN"
Our model showed 87% accuracy with 88% precision using the Draper VDISC dataset.
arXiv Detail & Related papers (2024-11-29T19:24:08Z) - OSPtrack: A Labeled Dataset Targeting Simulated Execution of Open-Source Software [0.0]
This dataset includes 9,461 package reports, of which 1,962 are identified as malicious.
The dataset includes both static and dynamic features such as files, sockets, commands, and DNS records.
This dataset supports runtime detection, enhances detection model training, and enables efficient comparative analysis across ecosystems.
arXiv Detail & Related papers (2024-11-22T10:07:42Z) - GNN-Based Code Annotation Logic for Establishing Security Boundaries in C Code [41.10157750103835]
Securing sensitive operations in today's interconnected software landscape is crucial yet challenging.
Modern platforms rely on Trusted Execution Environments (TEEs) to isolate security sensitive code from the main system.
Code Logic (CAL) is a pioneering tool that automatically identifies security sensitive components for TEE isolation.
arXiv Detail & Related papers (2024-11-18T13:40:03Z) - LLM-Assisted Static Analysis for Detecting Security Vulnerabilities [14.188864624736938]
Large language models (or LLMs) have shown impressive code generation capabilities but they cannot do complex reasoning over code to detect such vulnerabilities.
We propose IRIS, a neuro-symbolic approach that systematically combines LLMs with static analysis to perform whole-repository reasoning for security vulnerability detection.
arXiv Detail & Related papers (2024-05-27T14:53:35Z) - Towards Efficient Verification of Constant-Time Cryptographic
Implementations [5.433710892250037]
Constant-time programming discipline is an effective software-based countermeasure against timing side-channel attacks.
We put forward practical verification approaches based on a novel synergy of taint analysis and safety verification of self-composed programs.
Our approach is implemented as a cross-platform and fully automated tool CT-Prover.
arXiv Detail & Related papers (2024-02-21T03:39:14Z) - The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors [33.395068754566935]
VULEXPLAINER is a tool for locating vulnerability-critical code lines from coarse-level vulnerable code snippets.
It can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities.
arXiv Detail & Related papers (2024-01-05T10:15:04Z) - Discovering Malicious Signatures in Software from Structural
Interactions [7.06449725392051]
We propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science.
Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network.
Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution.
arXiv Detail & Related papers (2023-12-19T23:42:20Z) - SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs? [3.566250952750758]
We introduce SecureFalcon, an innovative model architecture with only 121 million parameters derived from the Falcon-40B model.
SecureFalcon achieves 94% accuracy in binary classification and up to 92% in multiclassification, with instant CPU inference times.
arXiv Detail & Related papers (2023-07-13T08:34:09Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - OutlierNets: Highly Compact Deep Autoencoder Network Architectures for
On-Device Acoustic Anomaly Detection [77.23388080452987]
Human operators often diagnose industrial machinery via anomalous sounds.
Deep learning-driven anomaly detection methods often require an extensive amount of computational resources which prohibits their deployment in factories.
Here we explore a machine-driven design exploration strategy to create OutlierNets, a family of highly compact deep convolutional autoencoder network architectures.
arXiv Detail & Related papers (2021-03-31T04:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.