Related papers: CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports

CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports

URL: http://arxiv.org/abs/2301.03944v4
Date: Sat, 29 Jul 2023 04:33:44 GMT
Title: CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports
Authors: Yunbo Lyu, Thanh Le-Cong, Hong Jin Kang, Ratnadira Widyasari, Zhipeng Zhao, Xuan-Bach D. Le, Ming Li, David Lo
Abstract summary: We propose a practical library identification approach, namely CHRONOS, based on zero-shot learning. The novelty of CHRONOS is three-fold. First, CHRONOS fits into the practical pipeline by considering the chronological order of vulnerability reports.
Score: 12.257538059511424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tools that alert developers about library vulnerabilities depend on accurate, up-to-date vulnerability databases which are maintained by security researchers. These databases record the libraries related to each vulnerability. However, the vulnerability reports may not explicitly list every library and human analysis is required to determine all the relevant libraries. Human analysis may be slow and expensive, which motivates the need for automated approaches. Researchers and practitioners have proposed to automatically identify libraries from vulnerability reports using extreme multi-label learning (XML). While state-of-the-art XML techniques showed promising performance, their experiment settings do not practically fit what happens in reality. Previous studies randomly split the vulnerability reports data for training and testing their models without considering the chronological order of the reports. This may unduly train the models on chronologically newer reports while testing the models on chronologically older ones. However, in practice, one often receives chronologically new reports, which may be related to previously unseen libraries. Under this practical setting, we observe that the performance of current XML techniques declines substantially, e.g., F1 decreased from 0.7 to 0.28 under experiments without and with consideration of chronological order of vulnerability reports. We propose a practical library identification approach, namely CHRONOS, based on zero-shot learning. The novelty of CHRONOS is three-fold. First, CHRONOS fits into the practical pipeline by considering the chronological order of vulnerability reports. Second, CHRONOS enriches the data of the vulnerability descriptions and labels using a carefully designed data enhancement step. Third, CHRONOS exploits the temporal ordering of the vulnerability reports using a cache to prioritize prediction of...

Related papers

Chasing the Clock: How Fast Are Vulnerabilities Fixed in the Maven Ecosystem? [1.5499426028105905]
The study focuses on the influence of CVE severity, library popularity as measured by the number of dependents, and version release frequency. The results suggest that critical vulnerabilities are addressed slightly faster compared to lower-severity ones.
arXiv Detail & Related papers (2025-03-28T21:48:22Z)
PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data. The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates. We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z)
TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time (Extended Version) [18.146377453918724]
Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods. This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task.
arXiv Detail & Related papers (2024-02-02T12:27:32Z)
VULNERLIZER: Cross-analysis Between Vulnerabilities and Software Libraries [4.2755847332268235]
VULNERLIZER is a novel framework for cross-analysis between vulnerabilities and software libraries. It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries. The trained model reaches a prediction accuracy of 75% or higher.
arXiv Detail & Related papers (2023-09-18T10:34:47Z)
ESRO: Experience Assisted Service Reliability against Outages [2.647000585570866]
We build a diagnostic service called ESRO that recommends root causes and remediation for failures. We evaluate our model on several cloud service outages of a large enterprise over the course of 2 years.
arXiv Detail & Related papers (2023-09-13T18:04:52Z)
Identifying Vulnerable Third-Party Java Libraries from Textual Descriptions of Vulnerabilities and Libraries [15.573551625937556]
VulLibMiner is first to identify vulnerable libraries from textual descriptions of both vulnerabilities and libraries. We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying vulnerable libraries on both their dataset named VeraJava and our VulLib dataset.
arXiv Detail & Related papers (2023-07-17T02:54:07Z)
Automated Labeling of German Chest X-Ray Radiology Reports using Deep Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model. Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z)
Queried Unlabeled Data Improves and Robustifies Class-Incremental Learning [133.39254981496146]
Class-incremental learning (CIL) suffers from the notorious dilemma between learning newly added classes and preserving previously learned class knowledge. We propose to leverage "free" external unlabeled data querying in continual learning. We show queried unlabeled data can continue to benefit, and seamlessly extend CIL-QUD into its robustified versions.
arXiv Detail & Related papers (2022-06-15T22:53:23Z)
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets. We define a uniform evaluation setup including a new formalization of the annotation error detection task. We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
Early Detection of Security-Relevant Bug Reports using Machine Learning: How Far Are We? [6.438136820117887]
In a typical maintenance scenario, security-relevant bug reports are prioritised by the development team when preparing corrective patches. Open security-relevant bug reports can become a critical leak of sensitive information that attackers can leverage to perform zero-day attacks. In recent years, approaches for the detection of security-relevant bug reports based on machine learning have been reported with promising performance.
arXiv Detail & Related papers (2021-12-19T11:30:29Z)
D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools. We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.