Looking for related discussions on GitHub Discussions
- URL: http://arxiv.org/abs/2206.11971v1
- Date: Thu, 23 Jun 2022 20:41:33 GMT
- Title: Looking for related discussions on GitHub Discussions
- Authors: Marcia Lima, Igor Steinmacher, Denae Ford, Evangeline Liu, Grace
Vorreuter, Tayana Conte, Bruno Gadelha
- Abstract summary: GitHub Discussions is a native forum to facilitate collaborative discussions between users and members of communities hosted on the platform.
As GitHub Discussions resembles PCQA forums, it faces challenges similar to those faced by such environments.
While duplicate posts have the same content - and may be exact copies - near-duplicates share similar topics and information.
We propose an approach based on a Sentence-BERT pre-trained model: the RD-Detector.
- Score: 18.688096673390586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software teams are increasingly adopting different tools and communication
channels to aid the software collaborative development model and coordinate
tasks. Among such resources, Programming Community-based Question Answering
(PCQA) forums have become widely used by developers. Such environments enable
developers to get and share technical information. Interested in supporting the
development and management of Open Source Software (OSS) projects, GitHub
announced GitHub Discussions - a native forum to facilitate collaborative
discussions between users and members of communities hosted on the platform. As
GitHub Discussions resembles PCQA forums, it faces challenges similar to those
faced by such environments, which include the occurrence of related discussions
(duplicates or near-duplicated posts). While duplicate posts have the same
content - and may be exact copies - near-duplicates share similar topics and
information. Both can introduce noise to the platform and compromise project
knowledge sharing. In this paper, we address the problem of detecting related
posts in GitHub Discussions. To do so, we propose an approach based on a
Sentence-BERT pre-trained model: the RD-Detector. We evaluated RD-Detector
using data from different OSS communities. OSS maintainers and Software
Engineering (SE) researchers manually evaluated the RD-Detector results, which
achieved 75% to 100% in terms of precision. In addition, maintainers pointed
out practical applications of the approach, such as merging the discussions'
threads and making discussions as comments on one another. OSS maintainers can
benefit from RD-Detector to address the labor-intensive task of manually
detecting related discussions and answering the same question multiple times.
Related papers
- CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency.
We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people.
These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - An Empirical Study on Developers Shared Conversations with ChatGPT in GitHub Pull Requests and Issues [20.121332699827633]
ChatGPT has significantly impacted software development practices.
Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored.
We analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in GitHub pull requests (PRs) and issues.
arXiv Detail & Related papers (2024-03-15T16:58:37Z) - Chronicles of CI/CD: A Deep Dive into its Usage Over Time [0.5705775078773656]
This paper analyzes the technologies developers use for CI/CD by analyzing GitHub repositories.
Using a list of the state-of-the-art CI/CD technologies, we use the GitHub search API to find repositories using each of these technologies.
We provide an overview of the use of CI/CD technologies in our days, but also what happened in the last 12 years.
arXiv Detail & Related papers (2024-02-27T15:20:11Z) - SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [80.52201658231895]
SWE-bench is an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories.
We show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues.
arXiv Detail & Related papers (2023-10-10T16:47:29Z) - How Do Java Developers Reuse StackOverflow Answers in Their GitHub Projects? [5.064338135936606]
StackOverflow (SO) is a widely used question-and-answer (Q&A) website for software developers and computer scientists.
GitHub is an online development platform used for storing, tracking, and collaborating on software projects.
We did an empirical study by mining the SO answers reused by Java projects available on GitHub.
arXiv Detail & Related papers (2023-08-18T14:04:59Z) - ChatDev: Communicative Agents for Software Development [84.90400377131962]
ChatDev is a chat-powered software development framework in which specialized agents are guided in what to communicate.
These agents actively contribute to the design, coding, and testing phases through unified language-based communication.
arXiv Detail & Related papers (2023-07-16T02:11:34Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - GitHub Discussions: An Exploratory Study of Early Adoption [23.844242004415406]
We conducted a mixed-methods study based on early adopters of GitHub discussions from January until July 2020.
We found that: (1) errors, unexpected behavior, and code reviews are prevalent discussion categories; (2) there is a positive relationship between project member involvement and discussion frequency; (3) developers consider GitHub Discussions useful but face the problem of topic duplication between Discussions and Issues.
Our findings are a first step towards data-informed guidance for using GitHub Discussions, opening up avenues for future work on this novel communication channel.
arXiv Detail & Related papers (2021-02-10T02:49:03Z) - A Transfer Learning Approach for Dialogue Act Classification of GitHub
Issue Comments [1.370633147306388]
This paper presents a transfer learning approach for performing dialogue act classification on issue comments on GitHub.
Since no large labeled corpus of GitHub issue comments exists, employing transfer learning enables us to leverage standard dialogue act datasets.
Being able to map the issue comments to dialogue acts is a useful stepping stone towards understanding cognitive team processes.
arXiv Detail & Related papers (2020-11-10T02:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.