How Do Code Smells Affect Skill Growth in Scratch Novice Programmers?
- URL: http://arxiv.org/abs/2507.17314v1
- Date: Wed, 23 Jul 2025 08:30:06 GMT
- Title: How Do Code Smells Affect Skill Growth in Scratch Novice Programmers?
- Authors: Ricardo Hidalgo Aragón, Jesús M. González-Barahona, Gregorio Robles,
- Abstract summary: The study will deliver the first large-scale, fine-grained map linking specific CT competencies to concrete design flaws and antipatterns.<n>By clarifying how programming habits influence early skill acquisition, the work advances both computing-education theory and practical tooling for sustainable software maintenance and evolution.
- Score: 3.8506666685467343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Context. Code smells, which are recurring anomalies in design or style, have been extensively researched in professional code. However, their significance in block-based projects created by novices is still largely unknown. Block-based environments such as Scratch offer a unique, data-rich setting to examine how emergent design problems intersect with the cultivation of computational-thinking (CT) skills. Objective. This research explores the connection between CT proficiency and design-level code smells--issues that may hinder software maintenance and evolution--in programs created by Scratch developers. We seek to identify which CT dimensions align most strongly with which code smells and whether task context moderates those associations. Method. A random sample of aprox. 2 million public Scratch projects is mined. Using open-source linters, we extract nine CT scores and 40 code smell indicators from these projects. After rigorous pre-processing, we apply descriptive analytics, robust correlation tests, stratified cross-validation, and exploratory machine-learning models; qualitative spot-checks contextualize quantitative patterns. Impact. The study will deliver the first large-scale, fine-grained map linking specific CT competencies to concrete design flaws and antipatterns. Results are poised to (i) inform evidence-based curricula and automated feedback systems, (ii) provide effect-size benchmarks for future educational interventions, and (iii) supply an open, pseudonymized dataset and reproducible analysis pipeline for the research community. By clarifying how programming habits influence early skill acquisition, the work advances both computing-education theory and practical tooling for sustainable software maintenance and evolution.
Related papers
- Is Compression Really Linear with Code Intelligence? [60.123628177110206]
textitFormat Annealing is a lightweight, transparent training methodology designed to assess the intrinsic capabilities of pre-trained models equitably.<n>Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and bits-per-character (BPC)<n>Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.
arXiv Detail & Related papers (2025-05-16T16:59:14Z) - Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights [0.0]
This paper introduces a novel scoring mechanism called the SBC score.<n>It is based on a reverse generation technique that leverages the natural language generation capabilities of Large Language Models.<n>Unlike direct code analysis, our approach reconstructs system requirements from AI-generated code and compares them with the original specifications.
arXiv Detail & Related papers (2025-02-11T01:12:11Z) - SnipGen: A Mining Repository Framework for Evaluating LLMs for Code [51.07471575337676]
Language Models (LLMs) are trained on extensive datasets that include code repositories.<n> evaluating their effectiveness poses significant challenges due to the potential overlap between the datasets used for training and those employed for evaluation.<n>We introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation.
arXiv Detail & Related papers (2025-02-10T21:28:15Z) - EnseSmells: Deep ensemble and programming language models for automated code smells detection [3.974095344344234]
A smell in software source code denotes an indication of suboptimal design and implementation decisions.<n>This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics.
arXiv Detail & Related papers (2025-02-07T15:35:19Z) - Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks [2.0072624123275533]
Recent work shows that Curriculum Learning can improve performance on code-related tasks through incremental learning based on the difficulty of synthetic code.<n>We investigate how the pre-trained code model (CodeT5) learns under CL, through the tasks of code clone detection and code summarization.<n>Our empirical study on the CodeXGLUE benchmark showed contrasting results to prior studies, where the model exhibited signs of catastrophic forgetting and shortcut learning.
arXiv Detail & Related papers (2025-02-06T06:33:08Z) - A Computational Method for Measuring "Open Codes" in Qualitative Analysis [47.358809793796624]
Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets.
We present a computational method to measure and identify potential biases from "open codes" systematically.
arXiv Detail & Related papers (2024-11-19T00:44:56Z) - Creating a Trajectory for Code Writing: Algorithmic Reasoning Tasks [0.923607423080658]
This paper describes instruments and the machine learning models used for validating them.
We have used the data collected in an introductory programming course in the penultimate week of the semester.
Preliminary research suggests ART type instruments can be combined with specific machine learning models to act as an effective learning trajectory.
arXiv Detail & Related papers (2024-04-03T05:07:01Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - What Makes Good Contrastive Learning on Small-Scale Wearable-based
Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task.
This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z) - An Empirical Study on Predictability of Software Code Smell Using Deep
Learning Models [3.2973778921083357]
Code smell is a surface indication of something tainted but in terms of software writing practices.
Recent studies have observed that codes having code smells are often prone to a higher probability of change in the software development cycle.
We developed code smell prediction models with the help of features extracted from source code to predict eight types of code smell.
arXiv Detail & Related papers (2021-08-08T12:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.