Related papers: Looking for related discussions on GitHub Discussions

Looking for related discussions on GitHub Discussions

URL: http://arxiv.org/abs/2206.11971v1
Date: Thu, 23 Jun 2022 20:41:33 GMT
Title: Looking for related discussions on GitHub Discussions
Authors: Marcia Lima, Igor Steinmacher, Denae Ford, Evangeline Liu, Grace Vorreuter, Tayana Conte, Bruno Gadelha
Abstract summary: GitHub Discussions is a native forum to facilitate collaborative discussions between users and members of communities hosted on the platform. As GitHub Discussions resembles PCQA forums, it faces challenges similar to those faced by such environments. While duplicate posts have the same content - and may be exact copies - near-duplicates share similar topics and information. We propose an approach based on a Sentence-BERT pre-trained model: the RD-Detector.
Score: 18.688096673390586
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software teams are increasingly adopting different tools and communication channels to aid the software collaborative development model and coordinate tasks. Among such resources, Programming Community-based Question Answering (PCQA) forums have become widely used by developers. Such environments enable developers to get and share technical information. Interested in supporting the development and management of Open Source Software (OSS) projects, GitHub announced GitHub Discussions - a native forum to facilitate collaborative discussions between users and members of communities hosted on the platform. As GitHub Discussions resembles PCQA forums, it faces challenges similar to those faced by such environments, which include the occurrence of related discussions (duplicates or near-duplicated posts). While duplicate posts have the same content - and may be exact copies - near-duplicates share similar topics and information. Both can introduce noise to the platform and compromise project knowledge sharing. In this paper, we address the problem of detecting related posts in GitHub Discussions. To do so, we propose an approach based on a Sentence-BERT pre-trained model: the RD-Detector. We evaluated RD-Detector using data from different OSS communities. OSS maintainers and Software Engineering (SE) researchers manually evaluated the RD-Detector results, which achieved 75% to 100% in terms of precision. In addition, maintainers pointed out practical applications of the approach, such as merging the discussions' threads and making discussions as comments on one another. OSS maintainers can benefit from RD-Detector to address the labor-intensive task of manually detecting related discussions and answering the same question multiple times.

Related papers

SocialED: A Python Library for Social Event Detection [53.928241775629566]
SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks. It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media. SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions.
arXiv Detail & Related papers (2024-12-18T03:37:47Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation. We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks. We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency. We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people. These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
An Empirical Study on Developers Shared Conversations with ChatGPT in GitHub Pull Requests and Issues [20.121332699827633]
ChatGPT has significantly impacted software development practices. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. We analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in GitHub pull requests (PRs) and issues.
arXiv Detail & Related papers (2024-03-15T16:58:37Z)
Chronicles of CI/CD: A Deep Dive into its Usage Over Time [0.5705775078773656]
This paper analyzes the technologies developers use for CI/CD by analyzing GitHub repositories. Using a list of the state-of-the-art CI/CD technologies, we use the GitHub search API to find repositories using each of these technologies. We provide an overview of the use of CI/CD technologies in our days, but also what happened in the last 12 years.
arXiv Detail & Related papers (2024-02-27T15:20:11Z)
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [80.52201658231895]
SWE-bench is an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories. We show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues.
arXiv Detail & Related papers (2023-10-10T16:47:29Z)
How Do Java Developers Reuse StackOverflow Answers in Their GitHub Projects? [5.064338135936606]
StackOverflow (SO) is a widely used question-and-answer (Q&A) website for software developers and computer scientists. GitHub is an online development platform used for storing, tracking, and collaborating on software projects. We did an empirical study by mining the SO answers reused by Java projects available on GitHub.
arXiv Detail & Related papers (2023-08-18T14:04:59Z)
ChatDev: Communicative Agents for Software Development [84.90400377131962]
ChatDev is a chat-powered software development framework in which specialized agents are guided in what to communicate. These agents actively contribute to the design, coding, and testing phases through unified language-based communication.
arXiv Detail & Related papers (2023-07-16T02:11:34Z)
The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour. This chapter explores the ecosystems of development bots and GitHub Actions. It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z)
GitHub Discussions: An Exploratory Study of Early Adoption [23.844242004415406]
We conducted a mixed-methods study based on early adopters of GitHub discussions from January until July 2020. We found that: (1) errors, unexpected behavior, and code reviews are prevalent discussion categories; (2) there is a positive relationship between project member involvement and discussion frequency; (3) developers consider GitHub Discussions useful but face the problem of topic duplication between Discussions and Issues. Our findings are a first step towards data-informed guidance for using GitHub Discussions, opening up avenues for future work on this novel communication channel.
arXiv Detail & Related papers (2021-02-10T02:49:03Z)
A Transfer Learning Approach for Dialogue Act Classification of GitHub Issue Comments [1.370633147306388]
This paper presents a transfer learning approach for performing dialogue act classification on issue comments on GitHub. Since no large labeled corpus of GitHub issue comments exists, employing transfer learning enables us to leverage standard dialogue act datasets. Being able to map the issue comments to dialogue acts is a useful stepping stone towards understanding cognitive team processes.
arXiv Detail & Related papers (2020-11-10T02:56:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.