Related papers: Comparative Code Structure Analysis using Deep Learning for Performance Prediction

Comparative Code Structure Analysis using Deep Learning for Performance Prediction

URL: http://arxiv.org/abs/2102.07660v1
Date: Fri, 12 Feb 2021 16:59:12 GMT
Title: Comparative Code Structure Analysis using Deep Learning for Performance Prediction
Authors: Nathan Pinnow, Tarek Ramadan, Tanzima Z. Islam, Chase Phelps, Jayaraman J. Thiagarajan
Abstract summary: This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
Score: 18.226950022938954
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance analysis has always been an afterthought during the application development process, focusing on application correctness first. The learning curve of the existing static and dynamic analysis tools are steep, which requires understanding low-level details to interpret the findings for actionable optimizations. Additionally, application performance is a function of an infinite number of unknowns stemming from the application-, runtime-, and interactions between the OS and underlying hardware, making it difficult, if not impossible, to model using any deep learning technique, especially without a large labeled dataset. In this paper, we address both of these problems by presenting a large corpus of a labeled dataset for the community and take a comparative analysis approach to mitigate all unknowns except their source code differences between different correct implementations of the same problem. We put the power of deep learning to the test for automatically extracting information from the hierarchical structure of abstract syntax trees to represent source code. This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. This research will enable performance-aware application development since every version of the application will continue to contribute to the corpora, which will enhance the performance of the model. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.

Related papers

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems [5.241450170761232]
This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms. Our systematic analysis reveals that graph compilers exhibit performance patterns highly dependent on both neural architecture and batch sizes. We introduce novel metrics to quantify a compiler's ability to mitigate performance friction as batch size increases.
arXiv Detail & Related papers (2025-04-28T19:02:16Z)
HuixiangDou2: A Robustly Optimized GraphRAG Approach [11.91228019623924]
Graph-based Retrieval-Augmented Generation (GraphRAG) addresses this by structuring it as a graph for dynamic retrieval. We introduce HuixiangDou2, a robustly optimized GraphRAG framework. Specifically, we leverage the effectiveness of dual-level retrieval and optimize its performance in a 32k context.
arXiv Detail & Related papers (2025-03-09T06:20:24Z)
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules. We use decision trees to convey this reasoning information, as they can be easily represented in natural language. OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z)
Leveraging Reinforcement Learning and Large Language Models for Code Optimization [14.602997316032706]
This paper introduces a new framework to decrease the complexity of code optimization. The proposed framework builds on large language models (LLMs) and reinforcement learning (RL) We run several experiments on the PIE dataset using a CodeT5 language model and RRHF, a new reinforcement learning algorithm.
arXiv Detail & Related papers (2023-12-09T19:50:23Z)
A Generic Performance Model for Deep Learning in a Distributed Environment [0.7829352305480285]
We propose a generic performance model of an application in a distributed environment with a generic expression of the application execution time. We have evaluated the proposed model on three deep learning frameworks (i.e., MXnet, and Pytorch)
arXiv Detail & Related papers (2023-05-19T13:30:34Z)
A Unified Active Learning Framework for Annotating Graph Data with Application to Software Source Code Performance Prediction [4.572330678291241]
We develop a unified active learning framework specializing in software performance prediction. We investigate the impact of using different levels of information for active and passive learning. Our approach aims to improve the investment in AI models for different software performance predictions.
arXiv Detail & Related papers (2023-04-06T14:00:48Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
Visualizing the Relationship Between Encoded Linguistic Information and Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality. We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances. Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z)
Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks [28.212889828892664]
We propose a novel source code model embedded with hierarchical dependencies. We introduce the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information. The results show that our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task.
arXiv Detail & Related papers (2021-11-20T04:03:42Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics [4.237343083490243]
In machine learning (ML), ensemble methods such as bagging, boosting, and stacking are widely-established approaches. StackGenVis is a visual analytics system for stacked generalization.
arXiv Detail & Related papers (2020-05-04T15:43:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.