InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming Assignments
- URL: http://arxiv.org/abs/2206.14175v3
- Date: Wed, 30 Apr 2025 15:39:53 GMT
- Title: InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming Assignments
- Authors: Pedro Orvalho, Mikoláš Janota, Vasco Manquinho,
- Abstract summary: This paper proposes InvAASTCluster, a novel approach for program clustering.<n>InvAASTCluster's program representation uses a combination of the program's semantics, through its invariants, and its structure.<n>Our results show that InvAASTCluster advances the current state-of-the-art when used by clustering-based repair tools.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Due to the vast number of students enrolled in programming courses, there has been an increasing number of automated program repair techniques focused on introductory programming assignments (IPAs). Typically, such techniques use program clustering to take advantage of previous correct student implementations to repair a new incorrect submission. These repair techniques use clustering methods since analyzing all available correct submissions to repair a program is not feasible. However, conventional clustering methods rely on program representations based on features such as abstract syntax trees (ASTs), syntax, control flow, and data flow. This paper proposes InvAASTCluster, a novel approach for program clustering that uses dynamically generated program invariants to cluster semantically equivalent IPAs. InvAASTCluster's program representation uses a combination of the program's semantics, through its invariants, and its structure through its anonymized abstract syntax tree (AASTs). Invariants denote conditions that must remain true during program execution, while AASTs are ASTs devoid of variable and function names, retaining only their types. Our experiments show that the proposed program representation outperforms syntax-based representations when clustering a set of correct IPAs. Furthermore, we integrate InvAASTCluster into a state-of-the-art clustering-based program repair tool. Our results show that InvAASTCluster advances the current state-of-the-art when used by clustering-based repair tools by repairing around 13% more students' programs, in a shorter amount of time.
Related papers
- Class-Aware Prototype Learning with Negative Contrast for Test-Time Adaptation of Vision-Language Models [48.61795272482598]
Vision-Language Models (VLMs) demonstrate impressive zero-shot generalization through large-scale image-text pretraining.<n>But their performance can drop once the deployment distribution diverges from the training distribution.<n>Test-Time Adaptation (TTA) methods update models using unlabeled target data.<n>We propose textbfClass-Aware textbfPrototype textbfL with textbfNegative textbfContrast(textbfCPL-NC), a lightweight TTA framework
arXiv Detail & Related papers (2025-10-22T17:38:35Z) - GRACE: Globally-Seeded Representation-Aware Cluster-Specific Evolution for Compiler Auto-Tuning [10.225578019039506]
This paper introduces GRACE, a novel framework for compiler auto-tuning, demonstrated for LLVM IR instruction count optimization.<n> GRACE effectively curtails the search space by leveraging pass synergies and a weighted scoring method to generate initial high-quality candidate sequences and a pass pool.<n>It then employs contrastive learning, using pass sequence-based data augmentation, to create program embeddings that facilitate similarity-aware clustering.
arXiv Detail & Related papers (2025-10-15T06:01:19Z) - Training-Free Class Purification for Open-Vocabulary Semantic Segmentation [72.87707878910896]
FreeCP is a training-free class purification framework for semantic segmentation.<n>We conduct experiments across eight benchmarks to validate FreeCP's effectiveness.<n>Results demonstrate that FreeCP, as a plug-and-play module, significantly boosts segmentation performance when combined with other OVSS methods.
arXiv Detail & Related papers (2025-08-01T11:55:12Z) - Object-Spatial Programming [2.8374498376407877]
We introduce Object-Spatial Programming (OSP), a programming model that extends Object-Oriented Programming.<n>OSP fundamentally inverts the traditional relationship between data and computation, enabling computation to move to data through four specialized archetypes.<n>This semantic enhancement enables runtime systems to make informed decisions about data locality, parallel execution, and distribution strategies.
arXiv Detail & Related papers (2025-03-20T02:55:40Z) - Cluster Specific Representation Learning [1.6727186769396276]
Despite its widespread application, there is no established definition of a good'' representation.
We propose a downstream-agnostic formulation: when inherent clusters exist in the data, the representations should be specific to each cluster.
Under this idea, we develop a meta-algorithm that jointly learns cluster-specific representations and cluster assignments.
arXiv Detail & Related papers (2024-12-04T16:59:37Z) - Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference.
We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem.
Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z) - Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens [57.37893387775829]
We introduce a fast and balanced clustering method, named textbfSemantic textbfEquitable textbfClustering (SEC)
SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner.
We propose a versatile vision backbone, SECViT, to serve as a vision language connector.
arXiv Detail & Related papers (2024-05-22T04:49:00Z) - Peer-aided Repairer: Empowering Large Language Models to Repair Advanced Student Assignments [26.236420215606238]
We develop a framework called PaR that is powered by the Large Language Model.
PaR works in three phases: Peer Solution Selection, Multi-Source Prompt Generation, and Program Repair.
The evaluation on Defects4DS and another well-investigated ITSP dataset reveals that PaR achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2024-04-02T09:12:21Z) - Open-Vocabulary Segmentation with Semantic-Assisted Calibration [68.41025728960176]
We study open-vocabulary segmentation (OVS) through calibrating in-vocabulary and domain-biased embedding space with contextual prior of CLIP.
We present a Semantic-assisted CAlibration Network (SCAN) to achieve state-of-the-art performance on open-vocabulary segmentation benchmarks.
arXiv Detail & Related papers (2023-12-07T07:00:09Z) - Weakly Supervised Semantic Parsing with Execution-based Spurious Program
Filtering [19.96076749160955]
We propose a domain-agnostic filtering mechanism based on program execution results.
We run a majority vote on these representations to identify and filter out programs with significantly different semantics from the other programs.
arXiv Detail & Related papers (2023-11-02T11:45:40Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - CEIL: A General Classification-Enhanced Iterative Learning Framework for
Text Clustering [16.08402937918212]
We propose a novel Classification-Enhanced Iterative Learning framework for short text clustering.
In each iteration, we first adopt a language model to retrieve the initial text representations.
After strict data filtering and aggregation processes, samples with clean category labels are retrieved, which serve as supervision information.
Finally, the updated language model with improved representation ability is used to enhance clustering in the next iteration.
arXiv Detail & Related papers (2023-04-20T14:04:31Z) - Harmonizing Base and Novel Classes: A Class-Contrastive Approach for
Generalized Few-Shot Segmentation [78.74340676536441]
We propose a class contrastive loss and a class relationship loss to regulate prototype updates and encourage a large distance between prototypes.
Our proposed approach achieves new state-of-the-art performance for the generalized few-shot segmentation task on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-03-24T00:30:25Z) - Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category.
We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP.
Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z) - Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies.
By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors.
The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Learning Differentiable Programs with Admissible Neural Heuristics [43.54820901841979]
We study the problem of learning differentiable functions expressed as programs in a domain-specific language.
We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax.
Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs.
arXiv Detail & Related papers (2020-07-23T16:07:39Z) - ProGraML: Graph-based Deep Learning for Program Optimization and
Analysis [16.520971531754018]
We introduce ProGraML, a graph-based program representation for machine learning.
ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches.
We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.
arXiv Detail & Related papers (2020-03-23T20:27:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.