Reassessing Java Code Readability Models with a Human-Centered Approach
- URL: http://arxiv.org/abs/2401.14936v1
- Date: Fri, 26 Jan 2024 15:18:22 GMT
- Title: Reassessing Java Code Readability Models with a Human-Centered Approach
- Authors: Agnia Sergeyuk, Olga Lvova, Sergey Titov, Anastasiia Serova, Farid
Bagirov, Evgeniia Kirillova, Timofey Bryksin
- Abstract summary: This research assesses existing Java Code Readability (CR) models for Large Language Models (LLMs) adjustments.
We identify 12 key code aspects influencing CR that were assessed by 390 programmers when labeling 120 AI-generated snippets.
Our findings indicate that when AI generates concise and executable code, it is often considered readable by CR models and developers.
- Score: 3.798885293742468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To ensure that Large Language Models (LLMs) effectively support user
productivity, they need to be adjusted. Existing Code Readability (CR) models
can guide this alignment. However, there are concerns about their relevance in
modern software engineering since they often miss the developers' notion of
readability and rely on outdated code. This research assesses existing Java CR
models for LLM adjustments, measuring the correlation between their and
developers' evaluations of AI-generated Java code. Using the Repertory Grid
Technique with 15 developers, we identified 12 key code aspects influencing CR
that were consequently assessed by 390 programmers when labeling 120
AI-generated snippets. Our findings indicate that when AI generates concise and
executable code, it is often considered readable by CR models and developers.
However, a limited correlation between these evaluations underscores the
importance of future research on learning objectives for adjusting LLMs and on
the aspects influencing CR evaluations included in predictive models.
Related papers
- Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness.
Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness.
We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Automating Patch Set Generation from Code Review Comments Using Large Language Models [2.045040820541428]
We provide code contexts to five popular Large Language Models (LLMs)
We obtain the suggested code-changes (patch sets) derived from real-world code-review comments.
The performance of each model is meticulously assessed by comparing their generated patch sets against the historical data of human-generated patch-sets.
arXiv Detail & Related papers (2024-04-10T02:46:08Z) - DeAL: Decoding-time Alignment for Large Language Models [59.63643988872571]
Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences.
We propose DeAL, a framework that allows the user to customize reward functions and enables Detime Alignment of LLMs.
Our experiments show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs.
arXiv Detail & Related papers (2024-02-05T06:12:29Z) - Improving the Learning of Code Review Successive Tasks with Cross-Task
Knowledge Distillation [1.0878040851638]
We introduce a novel deep-learning architecture, named DISCOREV, which employs cross-task knowledge distillation to address these tasks simultaneously.
We show that our approach generates better review comments, as measured by the BLEU score, as well as more accurate code refinement according to the CodeBLEU score.
arXiv Detail & Related papers (2024-02-03T07:02:22Z) - CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark.
In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship.
We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z) - Towards Automated Classification of Code Review Feedback to Support
Analytics [4.423428708304586]
This study aims to develop an automated code review comment classification system.
We trained and evaluated supervised learning-based DNN models leveraging code context, comment text, and a set of code metrics.
Our approach outperforms Fregnan et al.'s approach by achieving 18.7% higher accuracy.
arXiv Detail & Related papers (2023-07-07T21:53:20Z) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
Critiquing [139.77117915309023]
CRITIC allows large language models to validate and amend their own outputs in a manner similar to human interaction with tools.
Comprehensive evaluations involving free-form question answering, mathematical program synthesis, and toxicity reduction demonstrate that CRITIC consistently enhances the performance of LLMs.
arXiv Detail & Related papers (2023-05-19T15:19:44Z) - What Makes a Code Review Useful to OpenDev Developers? An Empirical
Investigation [4.061135251278187]
Even a minor improvement in the effectiveness of Code Reviews can incur significant savings for a software development organization.
This study aims to develop a finer grain understanding of what makes a code review comment useful to OSS developers.
arXiv Detail & Related papers (2023-02-22T22:48:27Z) - Aligning Offline Metrics and Human Judgments of Value for Code
Generation Models [25.726216146776054]
We show that while correctness captures high-value generations, programmers still rate code that fails unit tests as valuable if it reduces the overall effort needed to complete a coding task.
We propose a hybrid metric that combines functional correctness and syntactic similarity and show that it achieves a 14% stronger correlation with value.
arXiv Detail & Related papers (2022-10-29T05:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.