A Unified Active Learning Framework for Annotating Graph Data with
Application to Software Source Code Performance Prediction
- URL: http://arxiv.org/abs/2304.13032v2
- Date: Wed, 20 Sep 2023 11:18:08 GMT
- Title: A Unified Active Learning Framework for Annotating Graph Data with
Application to Software Source Code Performance Prediction
- Authors: Peter Samoaa, Linus Aronsson, Antonio Longa, Philipp Leitner, Morteza
Haghir Chehreghani
- Abstract summary: We develop a unified active learning framework specializing in software performance prediction.
We investigate the impact of using different levels of information for active and passive learning.
Our approach aims to improve the investment in AI models for different software performance predictions.
- Score: 4.572330678291241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most machine learning and data analytics applications, including performance
engineering in software systems, require a large number of annotations and
labelled data, which might not be available in advance. Acquiring annotations
often requires significant time, effort, and computational resources, making it
challenging. We develop a unified active learning framework specializing in
software performance prediction to address this task. We begin by parsing the
source code to an Abstract Syntax Tree (AST) and augmenting it with data and
control flow edges. Then, we convert the tree representation of the source code
to a Flow Augmented-AST graph (FA-AST) representation. Based on the graph
representation, we construct various graph embeddings (unsupervised and
supervised) into a latent space. Given such an embedding, the framework becomes
task agnostic since active learning can be performed using any regression
method and query strategy suited for regression. Within this framework, we
investigate the impact of using different levels of information for active and
passive learning, e.g., partially available labels and unlabeled test data. Our
approach aims to improve the investment in AI models for different software
performance predictions (execution time) based on the structure of the source
code. Our real-world experiments reveal that respectable performance can be
achieved by querying labels for only a small subset of all the data.
Related papers
- CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations.
We demonstrate its effectiveness in code smell detection as an illustrative use case.
ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - Active Learning for Abstractive Text Summarization [50.79416783266641]
We propose the first effective query strategy for Active Learning in abstractive text summarization.
We show that using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores.
arXiv Detail & Related papers (2023-01-09T10:33:14Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Leveraging Structural Properties of Source Code Graphs for Just-In-Time
Bug Prediction [6.467090475885797]
A graph is one of the most commonly used representations for understanding relational data.
In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph.
arXiv Detail & Related papers (2022-01-25T07:20:47Z) - Precise Learning of Source Code Contextual Semantics via Hierarchical
Dependence Structure and Graph Attention Networks [28.212889828892664]
We propose a novel source code model embedded with hierarchical dependencies.
We introduce the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information.
The results show that our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task.
arXiv Detail & Related papers (2021-11-20T04:03:42Z) - Graph Contrastive Learning Automated [94.41860307845812]
Graph contrastive learning (GraphCL) has emerged with promising representation learning performance.
The effectiveness of GraphCL hinges on ad-hoc data augmentations, which have to be manually picked per dataset.
This paper proposes a unified bi-level optimization framework to automatically, adaptively and dynamically select data augmentations when performing GraphCL on specific graph data.
arXiv Detail & Related papers (2021-06-10T16:35:27Z) - Comparative Code Structure Analysis using Deep Learning for Performance
Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure.
Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z) - ProGraML: Graph-based Deep Learning for Program Optimization and
Analysis [16.520971531754018]
We introduce ProGraML, a graph-based program representation for machine learning.
ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches.
We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.
arXiv Detail & Related papers (2020-03-23T20:27:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.