InvAASTCluster: On Applying Invariant-Based Program Clustering to
Introductory Programming Assignments
- URL: http://arxiv.org/abs/2206.14175v2
- Date: Wed, 29 Jun 2022 13:44:27 GMT
- Title: InvAASTCluster: On Applying Invariant-Based Program Clustering to
Introductory Programming Assignments
- Authors: Pedro Orvalho and Mikol\'a\v{s} Janota and Vasco Manquinho
- Abstract summary: InvAASTCluster is a novel approach for program clustering.
It takes advantage of dynamically generated program invariants observed over several program executions to cluster semantically equivalent IPAs.
Our results show that InvAASTCluster advances the current state-of-the-art when used by clustering-based program repair tools.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Due to the vast number of students enrolled in Massive Open Online Courses
(MOOCs), there has been an increasing number of automated program repair
techniques focused on introductory programming assignments (IPAs). Such
state-of-the-art techniques use program clustering to take advantage of
previous correct student implementations to repair a given new incorrect
submission. Usually, these repair techniques use clustering methods since
analyzing all available correct student submissions to repair a program is not
feasible. The clustering methods use program representations based on several
features such as abstract syntax tree (AST), syntax, control flow, and data
flow. However, these features are sometimes brittle when representing
semantically similar programs.
This paper proposes InvAASTCluster, a novel approach for program clustering
that takes advantage of dynamically generated program invariants observed over
several program executions to cluster semantically equivalent IPAs. Our main
objective is to find a more suitable representation of programs using a
combination of the program's semantics, through its invariants, and its
structure, through its anonymized abstract syntax tree. The evaluation of
InvAASTCluster shows that the proposed program representation outperforms
syntax-based representations when clustering a set of different correct IPAs.
Furthermore, we integrate InvAASTCluster into a state-of-the-art
clustering-based program repair tool and evaluate it on a set of IPAs. Our
results show that InvAASTCluster advances the current state-of-the-art when
used by clustering-based program repair tools by repairing a larger number of
students' programs in a shorter amount of time.
Related papers
- Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens [57.37893387775829]
We introduce a fast and balanced clustering method, named textbfSemantic textbfEquitable textbfClustering (SEC)
SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner.
We propose a versatile vision backbone, SECViT, to serve as a vision language connector.
arXiv Detail & Related papers (2024-05-22T04:49:00Z) - Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments [26.236420215606238]
We develop a framework called PaR that is powered by the Large Language Model.
PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair.
The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2024-04-02T09:12:21Z) - Weakly Supervised Semantic Parsing with Execution-based Spurious Program
Filtering [19.96076749160955]
We propose a domain-agnostic filtering mechanism based on program execution results.
We run a majority vote on these representations to identify and filter out programs with significantly different semantics from the other programs.
arXiv Detail & Related papers (2023-11-02T11:45:40Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - CEIL: A General Classification-Enhanced Iterative Learning Framework for
Text Clustering [16.08402937918212]
We propose a novel Classification-Enhanced Iterative Learning framework for short text clustering.
In each iteration, we first adopt a language model to retrieve the initial text representations.
After strict data filtering and aggregation processes, samples with clean category labels are retrieved, which serve as supervision information.
Finally, the updated language model with improved representation ability is used to enhance clustering in the next iteration.
arXiv Detail & Related papers (2023-04-20T14:04:31Z) - Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies.
By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors.
The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Learning Differentiable Programs with Admissible Neural Heuristics [43.54820901841979]
We study the problem of learning differentiable functions expressed as programs in a domain-specific language.
We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax.
Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs.
arXiv Detail & Related papers (2020-07-23T16:07:39Z) - ProGraML: Graph-based Deep Learning for Program Optimization and
Analysis [16.520971531754018]
We introduce ProGraML, a graph-based program representation for machine learning.
ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches.
We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.
arXiv Detail & Related papers (2020-03-23T20:27:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.