Leveraging multi-task learning to improve the detection of SATD and vulnerability
- URL: http://arxiv.org/abs/2501.15934v1
- Date: Mon, 27 Jan 2025 10:31:07 GMT
- Title: Leveraging multi-task learning to improve the detection of SATD and vulnerability
- Authors: Barbara Russo, Jorge Melegati, Moritz Mock,
- Abstract summary: Self-Admitted Technical Debt (SATD) are comments in the code that indicate not-quite-right code introduced for short-term needs.
VulSATD is a deep learner that detects vulnerable and SATD code based on CodeBERT.
- Score: 2.5385600700122737
- License:
- Abstract: Multi-task learning is a paradigm that leverages information from related tasks to improve the performance of machine learning. Self-Admitted Technical Debt (SATD) are comments in the code that indicate not-quite-right code introduced for short-term needs, i.e., technical debt (TD). Previous research has provided evidence of a possible relationship between SATD and the existence of vulnerabilities in the code. In this work, we investigate if multi-task learning could leverage the information shared between SATD and vulnerabilities to improve the automatic detection of these issues. To this aim, we implemented VulSATD, a deep learner that detects vulnerable and SATD code based on CodeBERT, a pre-trained transformers model. We evaluated VulSATD on MADE-WIC, a fused dataset of functions annotated for TD (through SATD) and vulnerability. We compared the results using single and multi-task approaches, obtaining no significant differences even after employing a weighted loss. Our findings indicate the need for further investigation into the relationship between these two aspects of low-quality code. Specifically, it is possible that only a subset of technical debt is directly associated with security concerns. Therefore, the relationship between different types of technical debt and software vulnerabilities deserves future exploration and a deeper understanding.
Related papers
- Evidence is All We Need: Do Self-Admitted Technical Debts Impact Method-Level Maintenance? [1.0377683220196874]
Self-Admitted Technical Debt (SATD) refers to the phenomenon where developers explicitly acknowledge technical debt through comments in the source code.
This paper aims to empirically investigate the influence of SATD on various facets of software maintenance at the method level.
arXiv Detail & Related papers (2024-11-21T01:21:35Z) - Improving the detection of technical debt in Java source code with an enriched dataset [12.07607688189035]
Technical debt (TD) is the additional work and costs that emerge when developers opt for a quick and easy solution to a problem.
Recent research has focused on detecting Self-Admitted Technical Debts (SATDs) by analyzing comments embedded in source code.
We curated the first ever dataset of TD identified by code comments, coupled with its associated source code.
arXiv Detail & Related papers (2024-11-08T10:12:33Z) - Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models [8.167614500821223]
We introduce MSIVD, multitask self-instructed fine-tuning for vulnerability detection, inspired by chain-of-thought prompting and LLM self-instruction.
Our experiments demonstrate that MSIVD achieves superior performance, outperforming the highest LLM-based vulnerability detector baseline (LineVul) with a F1 score of 0.92 on the BigVul dataset, and 0.48 on the PreciseBugs dataset.
arXiv Detail & Related papers (2024-06-09T19:18:05Z) - Utilization of machine learning for the detection of self-admitted
vulnerabilities [0.0]
Technical debt is a metaphor that describes not-quite-right code introduced for short-term needs.
Developers are aware of it and admit it in source code comments, which is called Self- Admitted Technical Debt (SATD)
arXiv Detail & Related papers (2023-09-27T12:38:12Z) - Multitask Learning with No Regret: from Improved Confidence Bounds to
Active Learning [79.07658065326592]
Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning.
We provide novel multitask confidence intervals in the challenging setting when neither the similarity between tasks nor the tasks' features are available to the learner.
We propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance.
arXiv Detail & Related papers (2023-08-03T13:08:09Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - VulBERTa: Simplified Source Code Pre-Training for Vulnerability
Detection [1.256413718364189]
VulBERTa is a deep learning approach to detect security vulnerabilities in source code.
Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects.
We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets.
arXiv Detail & Related papers (2022-05-25T00:56:43Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.