Code Recommendation for Open Source Software Developers
- URL: http://arxiv.org/abs/2210.08332v3
- Date: Tue, 25 Apr 2023 11:53:04 GMT
- Title: Code Recommendation for Open Source Software Developers
- Authors: Yiqiao Jin, Yunsheng Bai, Yanqiao Zhu, Yizhou Sun, Wei Wang
- Abstract summary: CODER is a novel graph-based code recommendation framework for open source software developers.
Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
- Score: 32.181023933552694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open Source Software (OSS) is forming the spines of technology
infrastructures, attracting millions of talents to contribute. Notably, it is
challenging and critical to consider both the developers' interests and the
semantic features of the project code to recommend appropriate development
tasks to OSS developers. In this paper, we formulate the novel problem of code
recommendation, whose purpose is to predict the future contribution behaviors
of developers given their interaction history, the semantic features of source
code, and the hierarchical file structures of projects. Considering the complex
interactions among multiple parties within the system, we propose CODER, a
novel graph-based code recommendation framework for open source software
developers. CODER jointly models microscopic user-code interactions and
macroscopic user-project interactions via a heterogeneous graph and further
bridges the two levels of information through aggregation on file-structure
graphs that reflect the project hierarchy. Moreover, due to the lack of
reliable benchmarks, we construct three large-scale datasets to facilitate
future research in this direction. Extensive experiments show that our CODER
framework achieves superior performance under various experimental settings,
including intra-project, cross-project, and cold-start recommendation. We will
release all the datasets, code, and utilities for data retrieval upon the
acceptance of this work.
Related papers
- Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation [2.6026969939746705]
This paper proposes a Chain of Programming framework to decompose the code generation process into five steps.
The framework incorporates a shared information pool, knowledge base retrieval, and user feedback mechanisms.
It significantly improves the logical clarity, syntactical correctness, and executability of the generated code.
arXiv Detail & Related papers (2024-11-16T09:20:35Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
CoGCL aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.
We introduce a multi-level vector quantizer in an end-to-end manner to quantize user and item representations into discrete codes.
For neighborhood structure, we propose virtual neighbor augmentation by treating discrete codes as virtual neighbors.
Regarding semantic relevance, we identify similar users/items based on shared discrete codes and interaction targets to generate the semantically relevant view.
arXiv Detail & Related papers (2024-09-09T14:04:17Z) - Multi-Agent Software Development through Cross-Team Collaboration [30.88149502999973]
We introduce Cross-Team Collaboration (CTC), a scalable multi-team framework for software development.
CTC enables orchestrated teams to jointly propose various decisions and communicate with their insights.
Results show a notable increase in quality compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-06-13T10:18:36Z) - Collaborative, Code-Proximal Dynamic Software Visualization within Code
Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors.
Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior.
Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z) - Dataflow graphs as complete causal graphs [17.15640410609126]
We consider an alternative approach to software design, flow-based programming (FBP)
We show how this connection can be leveraged to improve day-to-day tasks in software projects.
arXiv Detail & Related papers (2023-03-16T17:59:13Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Enabling collaborative data science development with the Ballet
framework [9.424574945499844]
We present a novel conceptual framework and ML programming model to address challenges to scaling data science collaborations.
We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science.
arXiv Detail & Related papers (2020-12-14T18:51:23Z) - Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers.
We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects.
We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.