Predicting long time contributors with knowledge units of programming languages: an empirical study
- URL: http://arxiv.org/abs/2405.13852v1
- Date: Wed, 22 May 2024 17:28:06 GMT
- Title: Predicting long time contributors with knowledge units of programming languages: an empirical study
- Authors: Md Ahasanuzzaman, Gustavo A. Oliva, Ahmed E. Hassan,
- Abstract summary: This paper reports an empirical study on the usage of knowledge units (KUs) of the Java programming language to predict LTCs.
A KU is a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language.
We build a prediction model called KULTC, which leverages KU-based features along five different dimensions.
- Score: 3.6840775431698893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting potential long-time contributors (LTCs) early allows project maintainers to effectively allocate resources and mentoring to enhance their development and retention. Mapping programming language expertise to developers and characterizing projects in terms of how they use programming languages can help identify developers who are more likely to become LTCs. However, prior studies on predicting LTCs do not consider programming language skills. This paper reports an empirical study on the usage of knowledge units (KUs) of the Java programming language to predict LTCs. A KU is a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language. We build a prediction model called KULTC, which leverages KU-based features along five different dimensions. We detect and analyze KUs from the studied 75 Java projects (353K commits and 168K pull requests) as well as 4,219 other Java projects in which the studied developers previously worked (1.7M commits). We compare the performance of KULTC with the state-of-the-art model, which we call BAOLTC. Even though KULTC focuses exclusively on the programming language perspective, KULTC achieves a median AUC of at least 0.75 and significantly outperforms BAOLTC. Combining the features of KULTC with the features of BAOLTC results in an enhanced model (KULTC+BAOLTC) that significantly outperforms BAOLTC with a normalized AUC improvement of 16.5%. Our feature importance analysis with SHAP reveals that developer expertise in the studied project is the most influential feature dimension for predicting LTCs. Finally, we develop a cost-effective model (KULTC_DEV_EXP+BAOLTC) that significantly outperforms BAOLTC. These encouraging results can be helpful to researchers who wish to further study the developers' engagement/retention to FLOSS projects or build models for predicting LTCs.
Related papers
- ToolRL: Reward is All Tool Learning Needs [54.16305891389931]
Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities.
Recent advancements in reinforcement learning (RL) have demonstrated promising reasoning and generalization abilities.
We present the first comprehensive study on reward design for tool selection and application tasks within the RL paradigm.
arXiv Detail & Related papers (2025-04-16T21:45:32Z) - OpenCodeReasoning: Advancing Data Distillation for Competitive Coding [61.15402517835137]
We build a supervised fine-tuning (SFT) dataset to achieve state-of-the-art coding capability results in models of various sizes.
Our models use only SFT to achieve 61.8% on LiveCodeBench and 24.6% on CodeContests, surpassing alternatives trained with reinforcement learning.
arXiv Detail & Related papers (2025-04-02T17:50:31Z) - Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions [65.89403417819764]
We quantify the impact of design choices on language model capabilities.
By incorporating features besides model size and number of training tokens, we can achieve a relative 3-28% increase in ability to predict downstream performance.
arXiv Detail & Related papers (2025-03-05T19:46:04Z) - ProBench: Benchmarking Large Language Models in Competitive Programming [44.09445715541973]
We propose ProBench to benchmark large language models (LLMs) in competitive programming.
ProBench collects a comprehensive set of competitive programming problems from Codeforces, Luogu, and Nowcoder platforms.
We assess 9 latest LLMs in competitive programming across multiple dimensions, including thought chain analysis, error type diagnosis, and reasoning depth evaluation.
arXiv Detail & Related papers (2025-02-28T09:12:42Z) - Continuous Integration Practices in Machine Learning Projects: The Practitioners` Perspective [1.4165457606269516]
This study surveys 155 practitioners from 47 Machine Learning (ML) projects.
Practitioners highlighted eight key differences, including test complexity, infrastructure requirements, and build duration and stability.
Common challenges mentioned by practitioners include higher project complexity, model training demands, extensive data handling, increased computational resource needs, and dependency management.
arXiv Detail & Related papers (2025-02-24T18:01:50Z) - CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge [13.592814106490724]
CITYWALK is a novel framework for C++ unit test generation.
It provides a comprehensive understanding of the dependency relationships within the project under test via program analysis.
It incorporates language-specific knowledge about C++ derived from project documentation and empirical observations.
arXiv Detail & Related papers (2025-01-27T15:49:24Z) - Predicting post-release defects with knowledge units (KUs) of programming languages: an empirical study [25.96111422428881]
Defect prediction plays a crucial role in software engineering, enabling developers to identify defect-prone code and improve software quality.
To address this gap, we introduce Knowledge Units (KUs) of programming languages as a novel feature set for analyzing software systems and defect prediction.
A KU is a cohesive set of key capabilities that are offered by one or more building blocks of a given programming language.
arXiv Detail & Related papers (2024-12-03T23:22:06Z) - How to Train Long-Context Language Models (Effectively) [75.5418485597276]
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.
ProLong-8B, which is from Llama-3 and trained on 40B tokens, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K.
arXiv Detail & Related papers (2024-10-03T16:46:52Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation [2.93322471069531]
We conduct an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT.
Our findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation.
arXiv Detail & Related papers (2024-02-18T20:48:09Z) - An Empirical Study on Low Code Programming using Traditional vs Large Language Model Support [34.74300707132544]
Low-code programming (LCP) refers to programming using models at higher levels of abstraction.
The technical principles and application scenarios of traditional approaches to LCP and LLM-based LCP are significantly different.
arXiv Detail & Related papers (2024-02-02T05:52:32Z) - DevEval: Evaluating Code Generation in Practical Software Projects [52.16841274646796]
We propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects.
DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects.
We assess five popular LLMs on DevEval and reveal their actual abilities in code generation.
arXiv Detail & Related papers (2024-01-12T06:51:30Z) - Mini-GPTs: Efficient Large Language Models through Contextual Pruning [0.0]
This paper introduces a novel approach in developing Mini-GPTs via contextual pruning.
We employ the technique across diverse and complex datasets, including US law, Medical Q&A, Skyrim dialogue, English-Taiwanese translation, and Economics articles.
arXiv Detail & Related papers (2023-12-20T00:48:13Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - L-Eval: Instituting Standardized Evaluation for Long Context Language
Models [91.05820785008527]
We propose L-Eval to institute a more standardized evaluation for long context language models (LCLMs)
We build a new evaluation suite containing 20 sub-tasks, 508 long documents, and over 2,000 human-labeled query-response pairs.
Results show that popular n-gram matching metrics generally can not correlate well with human judgment.
arXiv Detail & Related papers (2023-07-20T17:59:41Z) - Exploring and Characterizing Large Language Models For Embedded System
Development and Debugging [10.967443876391611]
Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems has not been studied.
We develop an open source framework to evaluate leading LLMs to assess their capabilities and limitations for embedded system development.
We leverage this finding to study how human programmers interact with these tools, and develop an human-AI based software engineering workflow for building embedded systems.
arXiv Detail & Related papers (2023-07-07T20:14:22Z) - Improving Massively Multilingual ASR With Auxiliary CTC Objectives [40.10307386370194]
We introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark.
We investigate techniques inspired from recent Connectionist Temporal Classification ( CTC) studies to help the model handle the large number of languages.
Our state-of-the-art systems using self-supervised models with the Conformer architecture improve over the results of prior work on FLEURS by a relative 28.4% CER.
arXiv Detail & Related papers (2023-02-24T18:59:51Z) - Incentive Mechanism Design for Resource Sharing in Collaborative Edge
Learning [106.51930957941433]
In 5G and Beyond networks, Artificial Intelligence applications are expected to be increasingly ubiquitous.
This necessitates a paradigm shift from the current cloud-centric model training approach to the Edge Computing based collaborative learning scheme known as edge learning.
arXiv Detail & Related papers (2020-05-31T12:45:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.