CHRONOS: Time-Aware Zero-Shot Identification of Libraries from
Vulnerability Reports
- URL: http://arxiv.org/abs/2301.03944v4
- Date: Sat, 29 Jul 2023 04:33:44 GMT
- Title: CHRONOS: Time-Aware Zero-Shot Identification of Libraries from
Vulnerability Reports
- Authors: Yunbo Lyu, Thanh Le-Cong, Hong Jin Kang, Ratnadira Widyasari, Zhipeng
Zhao, Xuan-Bach D. Le, Ming Li, David Lo
- Abstract summary: We propose a practical library identification approach, namely CHRONOS, based on zero-shot learning.
The novelty of CHRONOS is three-fold. First, CHRONOS fits into the practical pipeline by considering the chronological order of vulnerability reports.
- Score: 12.257538059511424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tools that alert developers about library vulnerabilities depend on accurate,
up-to-date vulnerability databases which are maintained by security
researchers. These databases record the libraries related to each
vulnerability. However, the vulnerability reports may not explicitly list every
library and human analysis is required to determine all the relevant libraries.
Human analysis may be slow and expensive, which motivates the need for
automated approaches. Researchers and practitioners have proposed to
automatically identify libraries from vulnerability reports using extreme
multi-label learning (XML).
While state-of-the-art XML techniques showed promising performance, their
experiment settings do not practically fit what happens in reality. Previous
studies randomly split the vulnerability reports data for training and testing
their models without considering the chronological order of the reports. This
may unduly train the models on chronologically newer reports while testing the
models on chronologically older ones. However, in practice, one often receives
chronologically new reports, which may be related to previously unseen
libraries. Under this practical setting, we observe that the performance of
current XML techniques declines substantially, e.g., F1 decreased from 0.7 to
0.28 under experiments without and with consideration of chronological order of
vulnerability reports.
We propose a practical library identification approach, namely CHRONOS, based
on zero-shot learning. The novelty of CHRONOS is three-fold. First, CHRONOS
fits into the practical pipeline by considering the chronological order of
vulnerability reports. Second, CHRONOS enriches the data of the vulnerability
descriptions and labels using a carefully designed data enhancement step.
Third, CHRONOS exploits the temporal ordering of the vulnerability reports
using a cache to prioritize prediction of...
Related papers
- PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.
The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.
We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z) - TESSERACT: Eliminating Experimental Bias in Malware Classification
across Space and Time (Extended Version) [18.146377453918724]
Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods.
This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task.
arXiv Detail & Related papers (2024-02-02T12:27:32Z) - VULNERLIZER: Cross-analysis Between Vulnerabilities and Software
Libraries [4.2755847332268235]
VULNERLIZER is a novel framework for cross-analysis between vulnerabilities and software libraries.
It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries.
The trained model reaches a prediction accuracy of 75% or higher.
arXiv Detail & Related papers (2023-09-18T10:34:47Z) - ESRO: Experience Assisted Service Reliability against Outages [2.647000585570866]
We build a diagnostic service called ESRO that recommends root causes and remediation for failures.
We evaluate our model on several cloud service outages of a large enterprise over the course of 2 years.
arXiv Detail & Related papers (2023-09-13T18:04:52Z) - Identifying Vulnerable Third-Party Java Libraries from Textual
Descriptions of Vulnerabilities and Libraries [15.573551625937556]
VulLibMiner is first to identify vulnerable libraries from textual descriptions of both vulnerabilities and libraries.
We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying vulnerable libraries on both their dataset named VeraJava and our VulLib dataset.
arXiv Detail & Related papers (2023-07-17T02:54:07Z) - Automated Labeling of German Chest X-Ray Radiology Reports using Deep
Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model.
Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z) - Queried Unlabeled Data Improves and Robustifies Class-Incremental
Learning [133.39254981496146]
Class-incremental learning (CIL) suffers from the notorious dilemma between learning newly added classes and preserving previously learned class knowledge.
We propose to leverage "free" external unlabeled data querying in continual learning.
We show queried unlabeled data can continue to benefit, and seamlessly extend CIL-QUD into its robustified versions.
arXiv Detail & Related papers (2022-06-15T22:53:23Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Early Detection of Security-Relevant Bug Reports using Machine Learning:
How Far Are We? [6.438136820117887]
In a typical maintenance scenario, security-relevant bug reports are prioritised by the development team when preparing corrective patches.
Open security-relevant bug reports can become a critical leak of sensitive information that attackers can leverage to perform zero-day attacks.
In recent years, approaches for the detection of security-relevant bug reports based on machine learning have been reported with promising performance.
arXiv Detail & Related papers (2021-12-19T11:30:29Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.