Code Recommendation for Open Source Software Developers
- URL: http://arxiv.org/abs/2210.08332v3
- Date: Tue, 25 Apr 2023 11:53:04 GMT
- Title: Code Recommendation for Open Source Software Developers
- Authors: Yiqiao Jin, Yunsheng Bai, Yanqiao Zhu, Yizhou Sun, Wei Wang
- Abstract summary: CODER is a novel graph-based code recommendation framework for open source software developers.
Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
- Score: 32.181023933552694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open Source Software (OSS) is forming the spines of technology
infrastructures, attracting millions of talents to contribute. Notably, it is
challenging and critical to consider both the developers' interests and the
semantic features of the project code to recommend appropriate development
tasks to OSS developers. In this paper, we formulate the novel problem of code
recommendation, whose purpose is to predict the future contribution behaviors
of developers given their interaction history, the semantic features of source
code, and the hierarchical file structures of projects. Considering the complex
interactions among multiple parties within the system, we propose CODER, a
novel graph-based code recommendation framework for open source software
developers. CODER jointly models microscopic user-code interactions and
macroscopic user-project interactions via a heterogeneous graph and further
bridges the two levels of information through aggregation on file-structure
graphs that reflect the project hierarchy. Moreover, due to the lack of
reliable benchmarks, we construct three large-scale datasets to facilitate
future research in this direction. Extensive experiments show that our CODER
framework achieves superior performance under various experimental settings,
including intra-project, cross-project, and cold-start recommendation. We will
release all the datasets, code, and utilities for data retrieval upon the
acceptance of this work.
Related papers
- OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning, tasks and agent systems.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an open cookbook'' for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Multi-Agent Software Development through Cross-Team Collaboration [30.88149502999973]
We introduce Cross-Team Collaboration (CTC), a scalable multi-team framework for software development.
CTC enables orchestrated teams to jointly propose various decisions and communicate with their insights.
Results show a notable increase in quality compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-06-13T10:18:36Z) - A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware [13.27883339389175]
We propose a novel code generation framework, dubbed A3-CodGen, to harness information within the code repository to generate code with fewer potential logical errors.
Results demonstrate that by adopting the A3-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code.
arXiv Detail & Related papers (2023-12-10T05:36:06Z) - Collaborative, Code-Proximal Dynamic Software Visualization within Code
Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors.
Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior.
Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z) - Dataflow graphs as complete causal graphs [17.15640410609126]
We consider an alternative approach to software design, flow-based programming (FBP)
We show how this connection can be leveraged to improve day-to-day tasks in software projects.
arXiv Detail & Related papers (2023-03-16T17:59:13Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Enabling collaborative data science development with the Ballet
framework [9.424574945499844]
We present a novel conceptual framework and ML programming model to address challenges to scaling data science collaborations.
We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science.
arXiv Detail & Related papers (2020-12-14T18:51:23Z) - Representation of Developer Expertise in Open Source Software [12.583969739954526]
We use the World of Code infrastructure to extract the complete set of APIs in the files changed by open source developers.
We then employ Doc2Vec embeddings for vector representations of APIs, developers, and projects.
We evaluate if these embeddings reflect the postulated topology of the Skill Space.
arXiv Detail & Related papers (2020-05-20T16:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.