Related papers: PerfGen: Automated Performance Benchmark Generation for Big Data Analytics

PerfGen: Automated Performance Benchmark Generation for Big Data Analytics

URL: http://arxiv.org/abs/2412.04687v1
Date: Fri, 06 Dec 2024 00:58:20 GMT
Title: PerfGen: Automated Performance Benchmark Generation for Big Data Analytics
Authors: Jiyuan Wang, Jason Teoh, Muhammand Ali Gulza, Qian Zhang, Miryung Kim,
Abstract summary: Many symptoms of poor performance in big data analytics such as computational skews, data skews, and memory skews are input dependent.<n>PerfGen is designed to automatically generate inputs for the purpose of performance testing.<n>PerfGen achieves at least 11x speedup compared to a traditional fuzzing approach when generating inputs to trigger performance symptoms.
Score: 6.4905318866478625
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many symptoms of poor performance in big data analytics such as computational skews, data skews, and memory skews are input dependent. However, due to the lack of inputs that can trigger such performance symptoms, it is hard to debug and test big data analytics. We design PerfGen to automatically generate inputs for the purpose of performance testing. PerfGen overcomes three challenges when naively using automated fuzz testing for the purpose of performance testing. First, typical greybox fuzzing relies on coverage as a guidance signal and thus is unlikely to trigger interesting performance behavior. Therefore, PerfGen provides performance monitor templates that a user can extend to serve as a set of guidance metrics for grey-box fuzzing. Second, performance symptoms may occur at an intermediate or later stage of a big data analytics pipeline. Thus, PerfGen uses a phased fuzzing approach. This approach identifies symptom-causing intermediate inputs at an intermediate stage first and then converts them to the inputs at the beginning of the program with a pseudo-inverse function generated by a large language model. Third, PerfGen defines sets of skew-inspired input mutations, which increases the chance of inducing performance problems. We evaluate PerfGen using four case studies. PerfGen achieves at least 11x speedup compared to a traditional fuzzing approach when generating inputs to trigger performance symptoms. Additionally, identifying intermediate inputs first and then converting them to original inputs enables PerfGen to generate such workloads in less than 0.004% of the iterations required by a baseline approach.

Related papers

Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks. scaling them to large graphs is challenging due to the high computational and storage costs. We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z)
Data Augmentation by Fuzzing for Neural Test Generation [7.310817657037053]
We introduce a novel data augmentation technique, *FuzzAug*, that introduces the benefits of fuzzing to large language models. Our evaluations show that models trained with dataset augmented by FuzzAug increase assertion accuracy by 5%, improve compilation rate by more than 10%, and generate unit test functions with 5% more branch coverage.
arXiv Detail & Related papers (2024-06-12T22:09:27Z)
Generator-Based Fuzzers with Type-Based Targeted Mutation [1.4507298892594764]
In previous work, coverage-guided fuzzers used a mix of static analysis, taint analysis, and constraint-solving approaches to address this problem. In this paper, we introduce a type-based mutation, along with constant string lookup, for Java GBF. Results compared to a baseline GBF tool show an almost 20% average improvement in application coverage, and larger improvements when third-party code is included.
arXiv Detail & Related papers (2024-06-04T07:20:13Z)
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z)
Validity-Preserving Delta Debugging via Generator Trace Reduction [14.24086822861706]
GReduce searches for other executions on the generator that yield reduced, valid test inputs.<n>GReduce substantially outperforms state-of-the-art syntax-based reducers including Perses and T-PDD.
arXiv Detail & Related papers (2024-02-07T07:12:27Z)
Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z)
A hybrid feature learning approach based on convolutional kernels for ATM fault prediction using event-log data [5.859431341476405]
We present a predictive model based on a convolutional kernel (MiniROCKET and HYDRA) to extract features from event-log data. The proposed methodology is applied to a significant real-world collected dataset. The model was integrated into a container-based decision support system to support operators in the timely maintenance of ATMs.
arXiv Detail & Related papers (2023-05-17T08:55:53Z)
Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side. By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample. We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z)
Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep. We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
Fast Test Input Generation for Finding Deviated Behaviors in Compressed Deep Neural Network [18.205951607889556]
We propose TriggerFinder to automatically identify inputs to trigger deviated behaviors in compressed models. We evaluate TriggerFinder on 18 compressed models with two datasets.
arXiv Detail & Related papers (2021-12-06T07:12:49Z)
FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging [112.19994766375231]
Influence functions approximate the 'influences' of training data-points for test predictions. We present FastIF, a set of simple modifications to influence functions that significantly improves their run-time. Our experiments demonstrate the potential of influence functions in model interpretation and correcting model errors.
arXiv Detail & Related papers (2020-12-31T18:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.