LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs
- URL: http://arxiv.org/abs/2504.21770v1
- Date: Wed, 30 Apr 2025 16:15:53 GMT
- Title: LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs
- Authors: Baleegh Ahmad, Hammond Pearce, Ramesh Karri, Benjamin Tan,
- Abstract summary: LASHED combines the two approaches (LLMs and Static Analysis) to overcome each other's limitations for hardware security bug detection.<n>We find that 87.5% of instances flagged by our recommended scheme are plausible Common Weaknessions (CWEs)<n>In-context learning and asking the model to 'think again' improves LASHED's precision.
- Score: 24.821587679661974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While static analysis is useful in detecting early-stage hardware security bugs, its efficacy is limited because it requires information to form checks and is often unable to explain the security impact of a detected vulnerability. Large Language Models can be useful in filling these gaps by identifying relevant assets, removing false violations flagged by static analysis tools, and explaining the reported violations. LASHED combines the two approaches (LLMs and Static Analysis) to overcome each other's limitations for hardware security bug detection. We investigate our approach on four open-source SoCs for five Common Weakness Enumerations (CWEs) and present strategies for improvement with better prompt engineering. We find that 87.5% of instances flagged by our recommended scheme are plausible CWEs. In-context learning and asking the model to 'think again' improves LASHED's precision.
Related papers
- The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs [17.497629884237647]
BugLens is a post-refinement framework that significantly improves static analysis precision.<n>It raises precision from 0.10 (raw) and 0.50 (semi-automated refinement) to 0.72, substantially reducing false positives.<n>Our results suggest that a structured LLM-based workflow can meaningfully enhance the effectiveness of static analysis tools.
arXiv Detail & Related papers (2025-04-16T02:17:06Z) - SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model [52.47816604709358]
Video anomaly detection (VAD) aims to identify unexpected events in videos and has wide applications in safety-critical domains.<n> vision-language models (VLMs) have demonstrated strong multimodal reasoning capabilities, offering new opportunities for anomaly detection.<n>We propose SlowFastVAD, a hybrid framework that integrates a fast anomaly detector with a slow anomaly detector.
arXiv Detail & Related papers (2025-04-14T15:30:03Z) - CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection [2.5228276786940182]
This paper introduces CASTLE, a benchmarking framework for evaluating the vulnerability detection capabilities of different methods.<n>We assess 13 static analysis tools, 10 LLMs, and 2 formal verification tools using a hand-crafted dataset of 250 micro-benchmark programs covering 25 common CWEs.
arXiv Detail & Related papers (2025-03-12T14:30:05Z) - LLM-Safety Evaluations Lack Robustness [58.334290876531036]
We argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of noise.
We propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers.
arXiv Detail & Related papers (2025-03-04T12:55:07Z) - Code Change Intention, Development Artifact and History Vulnerability: Putting Them Together for Vulnerability Fix Detection by LLM [13.278153690972243]
VulFixMiner and CoLeFunDa focus solely on code changes, neglecting essential context from development artifacts.
We propose LLM4VFD, a novel framework that leverages Large Language Models (LLMs) enhanced with Chain-of-Thought reasoning and In-Context Learning.
arXiv Detail & Related papers (2025-01-24T23:40:03Z) - Boosting Cybersecurity Vulnerability Scanning based on LLM-supported Static Application Security Testing [5.644999288757871]
Large Language Models (LLMs) have demonstrated powerful code analysis capabilities, but their static training data and privacy risks limit their effectiveness.
We propose LSAST, a novel approach that integrates LLMs with SAST scanners to enhance vulnerability detection.
We set a new benchmark for static vulnerability analysis, offering a robust, privacy-conscious solution.
arXiv Detail & Related papers (2024-09-24T04:42:43Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study [1.03590082373586]
We propose using large language models (LLMs) to assist in finding vulnerabilities in source code.
The aim is to test multiple state-of-the-art LLMs and identify the best prompting strategies.
We find that LLMs can pinpoint many more issues than traditional static analysis tools, outperforming traditional tools in terms of recall and F1 scores.
arXiv Detail & Related papers (2024-05-24T14:59:19Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [12.82645410161464]
We evaluate the effectiveness of 16 pre-trained Large Language Models on 5,000 code samples from five diverse security datasets.
Overall, LLMs show modest effectiveness in detecting vulnerabilities, obtaining an average accuracy of 62.8% and F1 score of 0.71 across datasets.
We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets in terms of F1 score (by upto 0.18 on average)
arXiv Detail & Related papers (2023-11-16T13:17:20Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.