Code Recommendation for Open Source Software Developers
- URL: http://arxiv.org/abs/2210.08332v3
- Date: Tue, 25 Apr 2023 11:53:04 GMT
- Title: Code Recommendation for Open Source Software Developers
- Authors: Yiqiao Jin, Yunsheng Bai, Yanqiao Zhu, Yizhou Sun, Wei Wang
- Abstract summary: CODER is a novel graph-based code recommendation framework for open source software developers.
Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
- Score: 32.181023933552694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open Source Software (OSS) is forming the spines of technology
infrastructures, attracting millions of talents to contribute. Notably, it is
challenging and critical to consider both the developers' interests and the
semantic features of the project code to recommend appropriate development
tasks to OSS developers. In this paper, we formulate the novel problem of code
recommendation, whose purpose is to predict the future contribution behaviors
of developers given their interaction history, the semantic features of source
code, and the hierarchical file structures of projects. Considering the complex
interactions among multiple parties within the system, we propose CODER, a
novel graph-based code recommendation framework for open source software
developers. CODER jointly models microscopic user-code interactions and
macroscopic user-project interactions via a heterogeneous graph and further
bridges the two levels of information through aggregation on file-structure
graphs that reflect the project hierarchy. Moreover, due to the lack of
reliable benchmarks, we construct three large-scale datasets to facilitate
future research in this direction. Extensive experiments show that our CODER
framework achieves superior performance under various experimental settings,
including intra-project, cross-project, and cold-start recommendation. We will
release all the datasets, code, and utilities for data retrieval upon the
acceptance of this work.
Related papers
- What is a Feature, Really? Toward a Unified Understanding Across SE Disciplines [0.7125007887148752]
In software engineering, the concept of a feature'' is inconsistently defined across disciplines such as requirements engineering (RE) and software product lines (SPL)
This paper proposes an empirical, data-driven approach to explore how features are described, implemented, and managed across real-world projects.
arXiv Detail & Related papers (2025-02-14T09:08:53Z) - Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation [2.6026969939746705]
This paper proposes a Chain of Programming framework to decompose the code generation process into five steps.
The framework incorporates a shared information pool, knowledge base retrieval, and user feedback mechanisms.
It significantly improves the logical clarity, syntactical correctness, and executability of the generated code.
arXiv Detail & Related papers (2024-11-16T09:20:35Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
We propose a novel framework that aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.
The core idea is to map users and items into discrete codes rich in collaborative information for reliable and informative contrastive view generation.
arXiv Detail & Related papers (2024-09-09T14:04:17Z) - Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a new sandbox suite tailored for integrated data-model co-development.
This sandbox provides a feedback-driven experimental platform, enabling cost-effective and guided refinement of both data and models.
arXiv Detail & Related papers (2024-07-16T14:40:07Z) - Collaborative, Code-Proximal Dynamic Software Visualization within Code
Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors.
Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior.
Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z) - Dataflow graphs as complete causal graphs [17.15640410609126]
We consider an alternative approach to software design, flow-based programming (FBP)
We show how this connection can be leveraged to improve day-to-day tasks in software projects.
arXiv Detail & Related papers (2023-03-16T17:59:13Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Enabling collaborative data science development with the Ballet
framework [9.424574945499844]
We present a novel conceptual framework and ML programming model to address challenges to scaling data science collaborations.
We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science.
arXiv Detail & Related papers (2020-12-14T18:51:23Z) - Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers.
We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects.
We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.