CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning
- URL: http://arxiv.org/abs/2408.02816v3
- Date: Sun, 09 Feb 2025 17:50:30 GMT
- Title: CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning
- Authors: Cuong Chi Le, Hoang Nhat Phan, Huy Nhat Phan, Tien N. Nguyen, Nghi D. Q. Bui,
- Abstract summary: We present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors.
CodeFlow effectively represents all possible execution paths and the statistic relations between different statements.
Our empirical evaluation demonstrates that CodeFlow significantly improves code coverage prediction accuracy and effectively localizes runtime errors.
- Score: 11.347234752942684
- License:
- Abstract: Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors by learning both static and dynamic dependencies within the code. By using control flow graphs (CFGs), CodeFlow effectively represents all possible execution paths and the statistic relations between different statements, providing a more comprehensive understanding of program behaviors. CodeFlow constructs CFGs to represent possible execution paths and learns vector representations (embeddings) for CFG nodes, capturing static control-flow dependencies. Additionally, it learns dynamic dependencies by leveraging execution traces, which reflect the impacts among statements during execution. This combination enables CodeFlow to accurately predict code coverage and identify runtime errors. Our empirical evaluation demonstrates that CodeFlow significantly improves code coverage prediction accuracy and effectively localizes runtime errors, outperforming state-of-the-art models.
Related papers
- VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning [10.70881967278009]
We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating multimodal Chain-of-Thought snippets (CoT) reasoning with a visual Control Flow Graph (CFG)
We address challenges in multimodal CoT integration through a reference mechanism, ensuring consistency between code and its execution path, thereby improving performance in program behavior prediction, error detection, and output generation.
arXiv Detail & Related papers (2024-10-30T19:07:01Z) - Towards Safe Automated Refactoring of Imperative Deep Learning Programs
to Graph Execution [4.786072763033669]
More natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance.
We present our ongoing work on an automated approach that assists developers in specifying whether and how their otherwise imperative DL code could be reliably and efficiently executed as graphs.
The approach is being implemented as a PyDev Eclipse plug-in and uses the WALA Ariadne analysis framework.
arXiv Detail & Related papers (2023-08-22T20:50:19Z) - TRACED: Execution-aware Pre-training for Source Code [24.101763959136058]
We introduce TRACED, an execution-aware pre-training strategy for source code.
Our goal is to teach code models the complicated execution logic during the pre-training, enabling the model to statically estimate the dynamic code properties.
TRACED relatively improves the statically pre-trained code models by 12.4% for complete execution path prediction and by 25.2% for runtime variable value predictions.
arXiv Detail & Related papers (2023-06-13T01:30:14Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - Towards Understanding and Improving GFlowNet Training [71.85707593318297]
We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution.
We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
arXiv Detail & Related papers (2023-05-11T22:50:41Z) - CFlowNets: Continuous Control with Generative Flow Networks [23.093316128475564]
Generative flow networks (GFlowNets) can be used as an alternative to reinforcement learning for exploratory control tasks.
We propose generative continuous flow networks (CFlowNets) that can be applied to continuous control tasks.
arXiv Detail & Related papers (2023-03-04T14:37:47Z) - Distributional GFlowNets with Quantile Flows [73.73721901056662]
Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a policy for generating complex structure through a series of decision-making steps.
In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training.
Our proposed textitquantile matching GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty.
arXiv Detail & Related papers (2023-02-11T22:06:17Z) - D$^3$FlowSLAM: Self-Supervised Dynamic SLAM with Flow Motion Decomposition and DINO Guidance [61.14088096348959]
We introduce a self-supervised deep SLAM method that robustly operates in dynamic scenes while accurately identifying dynamic components.
We propose a dynamic update module based on this representation and develop a dense SLAM system that excels in dynamic scenarios.
arXiv Detail & Related papers (2022-07-18T17:47:39Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code.
We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z) - TF-Coder: Program Synthesis for Tensor Manipulations [29.46838583290554]
We present a tool called TF-Coder for programming by example in pruning.
We train models to predict operations from features of the input and output tensors and natural language descriptions of tasks.
TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.
arXiv Detail & Related papers (2020-03-19T22:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.