Related papers: Code Recommendation for Open Source Software Developers

Code Recommendation for Open Source Software Developers

URL: http://arxiv.org/abs/2210.08332v3
Date: Tue, 25 Apr 2023 11:53:04 GMT
Title: Code Recommendation for Open Source Software Developers
Authors: Yiqiao Jin, Yunsheng Bai, Yanqiao Zhu, Yizhou Sun, Wei Wang
Abstract summary: CODER is a novel graph-based code recommendation framework for open source software developers. Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
Score: 32.181023933552694
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open Source Software (OSS) is forming the spines of technology infrastructures, attracting millions of talents to contribute. Notably, it is challenging and critical to consider both the developers' interests and the semantic features of the project code to recommend appropriate development tasks to OSS developers. In this paper, we formulate the novel problem of code recommendation, whose purpose is to predict the future contribution behaviors of developers given their interaction history, the semantic features of source code, and the hierarchical file structures of projects. Considering the complex interactions among multiple parties within the system, we propose CODER, a novel graph-based code recommendation framework for open source software developers. CODER jointly models microscopic user-code interactions and macroscopic user-project interactions via a heterogeneous graph and further bridges the two levels of information through aggregation on file-structure graphs that reflect the project hierarchy. Moreover, due to the lack of reliable benchmarks, we construct three large-scale datasets to facilitate future research in this direction. Extensive experiments show that our CODER framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation. We will release all the datasets, code, and utilities for data retrieval upon the acceptance of this work.

Related papers

IssueCourier: Multi-Relational Heterogeneous Temporal Graph Neural Network for Open-Source Issue Assignment [5.1987901165589]
Issue assignment plays a critical role in open-source software (OSS) maintenance.<n>We propose IssueCourier, a novel Multi-Relational Heterogeneous Temporal Graph Neural Network approach for issue assignment.<n>We show that IssueCourier can improve over the best baseline up to 45.49% in top-1 and 31.97% in MRR.
arXiv Detail & Related papers (2025-05-16T13:03:26Z)
What is a Feature, Really? Toward a Unified Understanding Across SE Disciplines [0.7125007887148752]
In software engineering, the concept of a feature'' is inconsistently defined across disciplines such as requirements engineering (RE) and software product lines (SPL) This paper proposes an empirical, data-driven approach to explore how features are described, implemented, and managed across real-world projects.
arXiv Detail & Related papers (2025-02-14T09:08:53Z)
Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation [2.6026969939746705]
This paper proposes a Chain of Programming framework to decompose the code generation process into five steps. The framework incorporates a shared information pool, knowledge base retrieval, and user feedback mechanisms. It significantly improves the logical clarity, syntactical correctness, and executability of the generated code.
arXiv Detail & Related papers (2024-11-16T09:20:35Z)
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited. We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
CoGCL aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes. We introduce a multi-level vector quantizer in an end-to-end manner to quantize user and item representations into discrete codes. For neighborhood structure, we propose virtual neighbor augmentation by treating discrete codes as virtual neighbors. Regarding semantic relevance, we identify similar users/items based on shared discrete codes and interaction targets to generate the semantically relevant view.
arXiv Detail & Related papers (2024-09-09T14:04:17Z)
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a new sandbox suite tailored for integrated data-model co-development. This sandbox provides a feedback-driven experimental platform, enabling cost-effective and guided refinement of both data and models.
arXiv Detail & Related papers (2024-07-16T14:40:07Z)
Multi-Agent Software Development through Cross-Team Collaboration [30.88149502999973]
We introduce Cross-Team Collaboration (CTC), a scalable multi-team framework for software development. CTC enables orchestrated teams to jointly propose various decisions and communicate with their insights. Results show a notable increase in quality compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-06-13T10:18:36Z)
Collaborative, Code-Proximal Dynamic Software Visualization within Code Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors. Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior. Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z)
Dataflow graphs as complete causal graphs [17.15640410609126]
We consider an alternative approach to software design, flow-based programming (FBP) We show how this connection can be leveraged to improve day-to-day tasks in software projects.
arXiv Detail & Related papers (2023-03-16T17:59:13Z)
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z)
Enabling collaborative data science development with the Ballet framework [9.424574945499844]
We present a novel conceptual framework and ML programming model to address challenges to scaling data science collaborations. We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science.
arXiv Detail & Related papers (2020-12-14T18:51:23Z)
Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers. We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects. We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.