Related papers: Employing Deep Learning and Structured Information Retrieval to Answer Clarification Questions on Bug Reports

Employing Deep Learning and Structured Information Retrieval to Answer Clarification Questions on Bug Reports

URL: http://arxiv.org/abs/2304.12494v3
Date: Sat, 8 Jul 2023 12:16:53 GMT
Title: Employing Deep Learning and Structured Information Retrieval to Answer Clarification Questions on Bug Reports
Authors: Usmi Mukherjee and Mohammad Masudur Rahman
Abstract summary: We propose a novel approach that uses CodeT5 in combination with Lucene to recommend answers to follow-up questions. We evaluate our recommended answers with the manually annotated answers using similarity metrics like Normalized Smooth BLEU Score, METEOR, Word Mover's Distance, and Semantic Similarity.
Score: 3.462843004438096
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Software bug reports reported on bug-tracking systems often lack crucial information for the developers to promptly resolve them, costing companies billions of dollars. There has been significant research on effectively eliciting information from bug reporters in bug tracking systems using different templates that bug reporters need to use. However, the need for asking follow-up questions persists. Recent studies propose techniques to suggest these follow-up questions to help developers obtain the missing details, but there has been little research on answering these follow up questions, which are often unanswered. In this paper, we propose a novel approach that uses CodeT5 in combination with Lucene, an information retrieval technique that leverages the relevance of different bug reports, their components, and follow-up questions to recommend answers. These top-performing answers, along with their bug report, serve as additional context apart from the deficient bug report to the deep learning model for generating an answer. We evaluate our recommended answers with the manually annotated answers using similarity metrics like Normalized Smooth BLEU Score, METEOR, Word Mover's Distance, and Semantic Similarity. We achieve a BLEU Score of up to 34 and Semantic Similarity of up to 64 which shows that the answers generated are understandable and good according to Google's standard and can outperform multiple baselines.

Related papers

Retrieval-Augmented Generation with Conflicting Evidence [57.66282463340297]
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses. In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources. We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
arXiv Detail & Related papers (2025-04-17T16:46:11Z)
Improved IR-based Bug Localization with Intelligent Relevance Feedback [2.9312156642007294]
Software bugs pose a significant challenge during development and maintenance, and practitioners spend nearly 50% of their time dealing with bugs. Many existing techniques adopt Information Retrieval (IR) to localize a reported bug using textual and semantic relevance between bug reports and source code. We present a novel technique for bug localization - BRaIn - that addresses the contextual gaps by assessing the relevance between bug reports and code.
arXiv Detail & Related papers (2025-01-17T20:29:38Z)
Open Domain Question Answering with Conflicting Contexts [55.739842087655774]
We find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search. We ask our annotators to provide explanations for their selections of correct answers.
arXiv Detail & Related papers (2024-10-16T07:24:28Z)
I Could've Asked That: Reformulating Unanswerable Questions [89.93173151422636]
We evaluate open-source and proprietary models for reformulating unanswerable questions. GPT-4 and Llama2-7B successfully reformulate questions only 26% and 12% of the time, respectively. We publicly release the benchmark and the code to reproduce the experiments.
arXiv Detail & Related papers (2024-07-24T17:59:07Z)
Localizing and Mitigating Errors in Long-form Question Answering [79.63372684264921]
Long-form question answering (LFQA) aims to provide thorough and in-depth answers to complex questions, enhancing comprehension. This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA answers.
arXiv Detail & Related papers (2024-07-16T17:23:16Z)
Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking. In this paper, we have proposed a solution using a combination of NLP techniques. It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z)
Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation [5.079750706023254]
Bugsplainer generates natural language explanations for software bugs by learning from a large corpus of bug-fix commits. Our evaluation using three performance metrics shows that Bugsplainer can generate understandable and good explanations according to Google's standard. We also conduct a developer study involving 20 participants where the explanations from Bugsplainer were found to be more accurate, more precise, more concise and more useful than the baselines.
arXiv Detail & Related papers (2022-12-08T22:19:45Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention [37.67372105858311]
This paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered. Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
arXiv Detail & Related papers (2022-08-02T06:44:51Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)
Attention-based model for predicting question relatedness on Stack Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically. ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics. Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z)
Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document. We show that readers engage in a series of pragmatic strategies to seek information. We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.