Adoption and Evolution of Code Style and Best Programming Practices in Open-Source Projects
- URL: http://arxiv.org/abs/2601.09832v1
- Date: Wed, 14 Jan 2026 19:48:47 GMT
- Title: Adoption and Evolution of Code Style and Best Programming Practices in Open-Source Projects
- Authors: Alvari Kupari, Nasser Giacaman, Valerio Terragni,
- Abstract summary: This paper analyzes 1,036 popular open-source JAVA projects on GITHUB to study how code style and programming practices are adopted and evolve over time.<n>We found widespread violations across repositories, with Javadoc and Naming violations being the most common.<n>We also found a significant number of violations of the GOOGLE Java Style Guide in categories often missed by modern static analysis tools.
- Score: 2.9439848714137447
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Following code style conventions in software projects is essential for maintaining overall code quality. Adhering to these conventions improves maintainability, understandability, and extensibility. Additionally, following best practices during software development enhances performance and reduces the likelihood of errors. This paper analyzes 1,036 popular open-source JAVA projects on GITHUB to study how code style and programming practices are adopted and evolve over time, examining their prevalence and the most common violations. Additionally, we study a subset of active repositories on a monthly basis to track changes in adherence to coding standards over time. We found widespread violations across repositories, with Javadoc and Naming violations being the most common. We also found a significant number of violations of the GOOGLE Java Style Guide in categories often missed by modern static analysis tools. Furthermore, repositories claiming to follow code-style practices exhibited slightly higher overall adherence to code-style and best-practices. The results provide valuable insights into the adoption of code style and programming practices, highlighting key areas for improvement in the open-source development community. Furthermore, the paper identifies important lessons learned and suggests future directions for improving code quality in JAVA projects.
Related papers
- Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization [82.29178197694819]
We derive and evaluate development-specific prompt optimization guidelines.<n>We use an iterative, test-driven approach to automatically refine code generation prompts.<n>We conduct an assessment with 50 practitioners, who report their usage of the elicited prompt improvement patterns.
arXiv Detail & Related papers (2026-01-19T15:01:42Z) - From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z) - An Empirical Study of Java Code Improvements Based on Stack Overflow Answer Edits [0.22166578153935793]
Suboptimal code is prevalent in software systems.<n>Developers often write low-quality code due to factors like technical knowledge gaps, insufficient experience, time pressure, management decisions, or personal factors.<n>We present an empirical study of SO Java answer edits and their application to improving code in open-source projects.
arXiv Detail & Related papers (2025-11-08T03:01:55Z) - Knowledge Graph Based Repository-Level Code Generation [0.0]
This paper introduces a novel knowledge graph-based approach to improve code search and retrieval.<n>The proposed approach represents code repositories as graphs, capturing structural and relational information for enhanced context-aware code generation.<n>We benchmark the proposed approach on the Evolutionary Code Benchmark dataset, a repository-level code generation benchmark, and demonstrate that our method significantly outperforms the baseline approach.
arXiv Detail & Related papers (2025-05-20T14:13:59Z) - Code Improvement Practices at Meta [11.3591598115242]
We investigate Meta's practices by collaborating with engineers on code quality.<n>We analyze rich source code change history to reveal a range of practices used for continual improvement of the.<n>Our analysis of the impact of reengineering activities revealed substantial improvements in quality and speed.
arXiv Detail & Related papers (2025-04-16T22:30:54Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [92.62952504133926]
This study evaluated the performance of three leading closed-source LLMs and six popular open-source LLMs on three commonly used benchmarks.<n>We developed a taxonomy of bugs for incorrect codes and analyzed the root cause for common bug types.<n>We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Refactoring Deep Learning Code: A Study of Practices and Unsatisfied Tool Needs [10.440289439181756]
Deep learning software has become progressively complex as the software evolves.<n>The insight of code in the context of deep learning is still unclear.<n>Research and the development of related tools are crucial for improving project maintainability and code quality.
arXiv Detail & Related papers (2024-05-08T07:35:14Z) - CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios [25.085449990951034]
We introduce CoderUJB, a new benchmark designed to evaluate large language models (LLMs) across diverse Java programming tasks.
Our empirical study on this benchmark investigates the coding abilities of various open-source and closed-source LLMs.
The findings indicate that while LLMs exhibit strong potential, challenges remain, particularly in non-functional code generation.
arXiv Detail & Related papers (2024-03-28T10:19:18Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.