Related papers: Duplicate Question Retrieval and Confirmation Time Prediction in Software Communities

Duplicate Question Retrieval and Confirmation Time Prediction in Software Communities

URL: http://arxiv.org/abs/2309.05035v3
Date: Tue, 5 Mar 2024 09:29:19 GMT
Title: Duplicate Question Retrieval and Confirmation Time Prediction in Software Communities
Authors: Rima Hazra, Debanjan Saha, Amruit Sahoo, Somnath Banerjee, Animesh Mukherjee
Abstract summary: Community Question Answering (CQA) in different domains is growing at a large scale because of the availability of several platforms and huge shareable information among users. With the rapid growth of such online platforms, a massive amount of archived data makes it difficult for moderators to retrieve possible duplicates for a new question. We propose a Siamese neural network based approach by exploiting both text and network-based features.
Score: 4.721583392950402
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Community Question Answering (CQA) in different domains is growing at a large scale because of the availability of several platforms and huge shareable information among users. With the rapid growth of such online platforms, a massive amount of archived data makes it difficult for moderators to retrieve possible duplicates for a new question and identify and confirm existing question pairs as duplicates at the right time. This problem is even more critical in CQAs corresponding to large software systems like askubuntu where moderators need to be experts to comprehend something as a duplicate. Note that the prime challenge in such CQA platforms is that the moderators are themselves experts and are therefore usually extremely busy with their time being extraordinarily expensive. To facilitate the task of the moderators, in this work, we have tackled two significant issues for the askubuntu CQA platform: (1) retrieval of duplicate questions given a new question and (2) duplicate question confirmation time prediction. In the first task, we focus on retrieving duplicate questions from a question pool for a particular newly posted question. In the second task, we solve a regression problem to rank a pair of questions that could potentially take a long time to get confirmed as duplicates. For duplicate question retrieval, we propose a Siamese neural network based approach by exploiting both text and network-based features, which outperforms several state-of-the-art baseline techniques. Our method outperforms DupPredictor and DUPE by 5% and 7% respectively. For duplicate confirmation time prediction, we have used both the standard machine learning models and neural network along with the text and graph-based features. We obtain Spearman's rank correlation of 0.20 and 0.213 (statistically significant) for text and graph based features respectively.

Related papers

Feature Engineering in Learning-to-Rank for Community Question Answering Task [2.5091819952713057]
Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions. Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data. These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to the query of the user.
arXiv Detail & Related papers (2023-09-14T11:18:26Z)
Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z)
Unsupervised Question Duplicate and Related Questions Detection in e-learning platforms [1.8749305679160364]
We propose a tool that can surface near-duplicate and semantically related questions without supervised data. The proposed tool follows an unsupervised hybrid pipeline of statistical and neural approaches. We demonstrate that QDup can detect near-duplicate questions and also suggest related questions for practice with remarkable accuracy and speed.
arXiv Detail & Related papers (2022-12-20T11:52:52Z)
Mining Duplicate Questions of Stack Overflow [5.924018537171331]
We propose two neural network based architectures for duplicate question detection on Stack Overflow. We also propose explicitly modeling the code present in questions to achieve results that surpass the state of the art.
arXiv Detail & Related papers (2022-10-04T14:34:59Z)
Attention-based model for predicting question relatedness on Stack Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically. ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics. Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z)
Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z)
Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs. We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops. Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z)
Better Early than Late: Fusing Topics with Word Embeddings for Neural Question Paraphrase Identification [24.574227630018758]
We propose two ways of merging topics with word embeddings in a new neural architecture for question paraphrase identification. Our results show that our system outperforms neural baselines on multiple CQA datasets.
arXiv Detail & Related papers (2020-07-22T10:09:26Z)
Match$^2$: A Matching over Matching Model for Similar Question Identification [74.7142127303489]
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. Similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. It has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the
arXiv Detail & Related papers (2020-06-21T05:59:34Z)
ClarQ: A large-scale and diverse dataset for Clarification Question Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange. We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering. We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z)
Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions. We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.