Related papers: Selecting Initial Seeds for Better JVM Fuzzing

Selecting Initial Seeds for Better JVM Fuzzing

URL: http://arxiv.org/abs/2408.08515v1
Date: Fri, 16 Aug 2024 04:10:59 GMT
Title: Selecting Initial Seeds for Better JVM Fuzzing
Authors: Tianchang Gao, Junjie Chen, Dong Wang, Yile Guo, Yingquan Zhao, Zan Wang,
Abstract summary: fuzzing presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features. It remains unclear whether existing seed selection methods are suitable for fuzzing and whether utilizing program coverage features can enhance effectiveness. This work takes the first look at initial seed selection in fuzzing, confirming its importance in fuzzing effectiveness and efficiency.
Score: 10.676082981363702
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Literature in traditional program fuzzing has confirmed that effectiveness is largely impacted by redundancy among initial seeds, thereby proposing a series of seed selection methods. JVM fuzzing, compared to traditional ones, presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features. However, it remains unclear whether the existing seed selection methods are suitable for JVM fuzzing and whether utilizing program features can enhance effectiveness. To address this, we devise a total of 10 initial seed selection methods, comprising coverage-based, prefuzz-based, and program-feature-based methods. We then conduct an empirical study on three JVM implementations to extensively evaluate the performance of the seed selection methods within two SOTA fuzzing techniques (JavaTailor and VECT). Specifically, we examine performance from three aspects: (i) effectiveness and efficiency using widely studied initial seeds, (ii) effectiveness using the programs in the wild, and (iii) the ability to detect new bugs. Evaluation results first show that the program-feature-based method that utilizes the control flow graph not only has a significantly lower time overhead (i.e., 30s), but also outperforms other methods, achieving 142% to 269% improvement compared to the full set of initial seeds. Second, results reveal that the initial seed selection greatly improves the quality of wild programs and exhibits complementary effectiveness by detecting new behaviors. Third, results demonstrate that given the same testing period, initial seed selection improves the JVM fuzzing techniques by detecting more unknown bugs. Particularly, 21 out of the 25 detected bugs have been confirmed or fixed by developers. This work takes the first look at initial seed selection in JVM fuzzing, confirming its importance in fuzzing effectiveness and efficiency.

Related papers

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection [52.716143424856185]
We propose LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Our method also outperforms the greedy search in attribution efficiency, being 1.6 times faster.
arXiv Detail & Related papers (2025-04-01T06:58:15Z)
Graspness Discovery in Clutters for Fast and Accurate Grasp Detection [57.81325062171676]
"graspness" is a quality based on geometry cues that distinguishes graspable areas in cluttered scenes. We develop a neural network named cascaded graspness model to approximate the searching process. Experiments on a large-scale benchmark, GraspNet-1Billion, show that our method outperforms previous arts by a large margin.
arXiv Detail & Related papers (2024-06-17T02:06:47Z)
Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding. The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data. We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z)
You Never Get a Second Chance To Make a Good First Impression: Seeding Active Learning for 3D Semantic Segmentation [29.54515277318063]
We propose SeedAL, a method to seed active learning for efficient annotation of 3D point clouds for semantic segmentation. Our experiments demonstrate the effectiveness of our approach compared to random seeding and existing methods.
arXiv Detail & Related papers (2023-04-23T22:38:25Z)
A GA-like Dynamic Probability Method With Mutual Information for Feature Selection [1.290382979353427]
We propose a GA-like dynamic probability (GADP) method with mutual information. As each gene's probability is independent, the chromosome variety in GADP is more notable than in traditional GA. To verify our method's superiority, we evaluate our method under multiple conditions on 15 datasets.
arXiv Detail & Related papers (2022-10-21T13:30:01Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Explanation-Guided Fairness Testing through Genetic Algorithm [18.642243829461158]
This work proposes ExpGA, an explanationguided fairness testing approach through a genetic algorithm (GA) ExpGA employs the explanation results generated by interpretable methods to collect high-quality initial seeds. It then adopts GA to search discriminatory sample candidates by optimizing a fitness value.
arXiv Detail & Related papers (2022-05-16T02:40:48Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining [76.95808270536318]
We propose an end-to-end system that learns to separate proposals into labeled and unlabeled regions using Pseudo-positive mining. While the labeled regions are processed as usual, self-supervised learning is used to process the unlabeled regions. We conduct exhaustive experiments on five splits on the PASCAL-VOC and COCO datasets achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-01-12T18:57:04Z)
Multi-Objective Optimisation of Multi-Output Neural Trees [1.000779758350696]
We propose a multi-output neural tree (MONT) algorithm, which is an evolutionary learning algorithm trained by the non-dominated genetic sorting algorithm (NSGA-III) We use nine benchmark classification learning problems to evaluate the performance of the MONT. The performance of MONT emerged better over a set of problems tackled in this study compared with a set of well-known classifiers.
arXiv Detail & Related papers (2020-10-09T12:21:59Z)
MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing [21.318110758739675]
Machine learning-Enhanced hybrid fUZZing system (MEUZZ) MEUZZ determines which new seeds are expected to produce better fuzzing yields based on the knowledge learned from past seed scheduling decisions. Results: MEUZZ significantly outperforms the state-of-the-art grey-box and hybrid fuzzers.
arXiv Detail & Related papers (2020-02-20T05:02:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.