Duplicate Question Retrieval and Confirmation Time Prediction in
Software Communities
- URL: http://arxiv.org/abs/2309.05035v3
- Date: Tue, 5 Mar 2024 09:29:19 GMT
- Title: Duplicate Question Retrieval and Confirmation Time Prediction in
Software Communities
- Authors: Rima Hazra, Debanjan Saha, Amruit Sahoo, Somnath Banerjee, Animesh
Mukherjee
- Abstract summary: Community Question Answering (CQA) in different domains is growing at a large scale because of the availability of several platforms and huge shareable information among users.
With the rapid growth of such online platforms, a massive amount of archived data makes it difficult for moderators to retrieve possible duplicates for a new question.
We propose a Siamese neural network based approach by exploiting both text and network-based features.
- Score: 4.721583392950402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Community Question Answering (CQA) in different domains is growing at a large
scale because of the availability of several platforms and huge shareable
information among users. With the rapid growth of such online platforms, a
massive amount of archived data makes it difficult for moderators to retrieve
possible duplicates for a new question and identify and confirm existing
question pairs as duplicates at the right time. This problem is even more
critical in CQAs corresponding to large software systems like askubuntu where
moderators need to be experts to comprehend something as a duplicate. Note that
the prime challenge in such CQA platforms is that the moderators are themselves
experts and are therefore usually extremely busy with their time being
extraordinarily expensive. To facilitate the task of the moderators, in this
work, we have tackled two significant issues for the askubuntu CQA platform:
(1) retrieval of duplicate questions given a new question and (2) duplicate
question confirmation time prediction. In the first task, we focus on
retrieving duplicate questions from a question pool for a particular newly
posted question. In the second task, we solve a regression problem to rank a
pair of questions that could potentially take a long time to get confirmed as
duplicates. For duplicate question retrieval, we propose a Siamese neural
network based approach by exploiting both text and network-based features,
which outperforms several state-of-the-art baseline techniques. Our method
outperforms DupPredictor and DUPE by 5% and 7% respectively. For duplicate
confirmation time prediction, we have used both the standard machine learning
models and neural network along with the text and graph-based features. We
obtain Spearman's rank correlation of 0.20 and 0.213 (statistically
significant) for text and graph based features respectively.
Related papers
- Feature Engineering in Learning-to-Rank for Community Question Answering
Task [2.5091819952713057]
Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions.
Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data.
These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to the query of the user.
arXiv Detail & Related papers (2023-09-14T11:18:26Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Unsupervised Question Duplicate and Related Questions Detection in
e-learning platforms [1.8749305679160364]
We propose a tool that can surface near-duplicate and semantically related questions without supervised data.
The proposed tool follows an unsupervised hybrid pipeline of statistical and neural approaches.
We demonstrate that QDup can detect near-duplicate questions and also suggest related questions for practice with remarkable accuracy and speed.
arXiv Detail & Related papers (2022-12-20T11:52:52Z) - Mining Duplicate Questions of Stack Overflow [5.924018537171331]
We propose two neural network based architectures for duplicate question detection on Stack Overflow.
We also propose explicitly modeling the code present in questions to achieve results that surpass the state of the art.
arXiv Detail & Related papers (2022-10-04T14:34:59Z) - Attention-based model for predicting question relatedness on Stack
Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically.
ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics.
Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z) - Open Question Answering over Tables and Text [55.8412170633547]
In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.
Most open QA systems have considered only retrieving information from unstructured text.
We present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task.
arXiv Detail & Related papers (2020-10-20T16:48:14Z) - Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.
We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops.
Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z) - Better Early than Late: Fusing Topics with Word Embeddings for Neural
Question Paraphrase Identification [24.574227630018758]
We propose two ways of merging topics with word embeddings in a new neural architecture for question paraphrase identification.
Our results show that our system outperforms neural baselines on multiple CQA datasets.
arXiv Detail & Related papers (2020-07-22T10:09:26Z) - Match$^2$: A Matching over Matching Model for Similar Question
Identification [74.7142127303489]
Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers.
Similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked.
It has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions.
Traditional methods typically take a one-side usage, which leverages the answer as some expanded representation of the
arXiv Detail & Related papers (2020-06-21T05:59:34Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions.
We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.