Related papers: An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection

An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection

URL: http://arxiv.org/abs/2505.19059v1
Date: Sun, 25 May 2025 09:28:33 GMT
Title: An Initial Exploration of Fine-tuning Small Language Models for Smart Contract Reentrancy Vulnerability Detection
Authors: Ignacio Mariano Andreozzi Pofcher, Joshua Ellul,
Abstract summary: Large Language Models (LLMs) are being used more and more for various coding tasks.<n>We evaluate whether smaller language models can be fine-tuned to achieve reasonable results for a niche area: vulnerability detection.
Score: 1.1049608786515839
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Models (LLMs) are being used more and more for various coding tasks, including to help coders identify bugs and are a promising avenue to support coders in various tasks including vulnerability detection -- particularly given the flexibility of such generative AI models and tools. Yet for many tasks it may not be suitable to use LLMs, for which it may be more suitable to use smaller language models that can fit and easily execute and train on a developer's computer. In this paper we explore and evaluate whether smaller language models can be fine-tuned to achieve reasonable results for a niche area: vulnerability detection -- specifically focusing on detecting the reentrancy bug in Solidity smart contracts.

Related papers

OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities [54.152681077418805]
Current detection approaches are fallible, and are particularly susceptible to attacks that exploit mismatched generalizations of model capabilities.<n>We propose OMNIGUARD, an approach for detecting harmful prompts across languages and modalities.<n>Our approach improves harmful prompt classification accuracy by 11.57% over the strongest baseline in a multilingual setting.
arXiv Detail & Related papers (2025-05-29T05:25:27Z)
Generative Large Language Model usage in Smart Contract Vulnerability Detection [8.720242549772154]
This paper presents a systematic review of the current LLM-based smart contract vulnerability detection tools.<n>We compare them against traditional static and dynamic analysis tools Slither and Mythril.<n>Our analysis highlights key areas where each performs better and shows that while these tools show promise, the LLM-based tools available for testing are not ready to replace more traditional tools.
arXiv Detail & Related papers (2025-04-07T02:33:40Z)
Leveraging Large Language Models and Machine Learning for Smart Contract Vulnerability Detection [0.0]
We train and test machine learning algorithms to classify smart contract codes according to type in order to compare model performance.<n>Our research combines machine learning and large language models to provide a rich and interpretable framework for detecting different smart contract vulnerabilities.
arXiv Detail & Related papers (2025-01-04T08:32:53Z)
Darkit: A User-Friendly Software Toolkit for Spiking Large Language Model [50.37090759139591]
Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters.<n>The human brain, employing bio-plausible spiking mechanisms, can accomplish the same tasks while significantly reducing energy consumption.<n>We are releasing a software toolkit named DarwinKit (Darkit) to accelerate the adoption of brain-inspired large language models.
arXiv Detail & Related papers (2024-12-20T07:50:08Z)
VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models [12.465060623389151]
This study introduces a new benchmark, VulDetectBench, to assess the vulnerability detection capabilities of Large Language Models (LLMs) The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security.
arXiv Detail & Related papers (2024-06-11T13:42:57Z)
M2CVD: Enhancing Vulnerability Semantic through Multi-Model Collaboration for Code Vulnerability Detection [52.4455893010468]
Large Language Models (LLMs) have strong capabilities in code comprehension, but fine-tuning costs and semantic alignment issues limit their project-specific optimization. Code models such CodeBERT are easy to fine-tune, but it is often difficult to learn vulnerability semantics from complex code languages. This paper introduces the Multi-Model Collaborative Vulnerability Detection approach (M2CVD) to improve the detection accuracy of code models.
arXiv Detail & Related papers (2024-06-10T00:05:49Z)
An Empirical Study of Automated Vulnerability Localization with Large Language Models [21.84971967029474]
Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in vulnerability localization remains underexplored. Our investigation encompasses 10+ leading LLMs suitable for code analysis, including ChatGPT and various open-source models. We explore the efficacy of these LLMs using 4 distinct paradigms: zero-shot learning, one-shot learning, discriminative fine-tuning, and generative fine-tuning.
arXiv Detail & Related papers (2024-03-30T08:42:10Z)
To Err is Machine: Vulnerability Detection Challenges LLM Reasoning [8.602355712876815]
We present a challenging code reasoning task: vulnerability detection.<n>State-of-the-art (SOTA) models reported only 54.5% Balanced Accuracy in our vulnerability detection evaluation.<n>New models, new training methods, or more execution-specific pretraining data may be needed to conquer vulnerability detection.
arXiv Detail & Related papers (2024-03-25T21:47:36Z)
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.