Mining Duplicate Questions of Stack Overflow
- URL: http://arxiv.org/abs/2210.01637v1
- Date: Tue, 4 Oct 2022 14:34:59 GMT
- Title: Mining Duplicate Questions of Stack Overflow
- Authors: Mihir Kale, Anirudha Rayasam, Radhika Parik, Pranav Dheram
- Abstract summary: We propose two neural network based architectures for duplicate question detection on Stack Overflow.
We also propose explicitly modeling the code present in questions to achieve results that surpass the state of the art.
- Score: 5.924018537171331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has a been a significant rise in the use of Community Question
Answering sites (CQAs) over the last decade owing primarily to their ability to
leverage the wisdom of the crowd. Duplicate questions have a crippling effect
on the quality of these sites. Tackling duplicate questions is therefore an
important step towards improving quality of CQAs. In this regard, we propose
two neural network based architectures for duplicate question detection on
Stack Overflow. We also propose explicitly modeling the code present in
questions to achieve results that surpass the state of the art.
Related papers
- Duplicate Question Retrieval and Confirmation Time Prediction in
Software Communities [4.721583392950402]
Community Question Answering (CQA) in different domains is growing at a large scale because of the availability of several platforms and huge shareable information among users.
With the rapid growth of such online platforms, a massive amount of archived data makes it difficult for moderators to retrieve possible duplicates for a new question.
We propose a Siamese neural network based approach by exploiting both text and network-based features.
arXiv Detail & Related papers (2023-09-10T14:13:54Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Answer ranking in Community Question Answering: a deep learning approach [0.0]
This work tries to advance the state of the art on answer ranking for community Question Answering by proceeding with a deep learning approach.
We created a large data set of questions and answers posted to the Stack Overflow website.
We leveraged the natural language processing capabilities of dense embeddings and LSTM networks to produce a prediction for the accepted answer attribute.
arXiv Detail & Related papers (2022-10-16T18:47:41Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - Improving the Question Answering Quality using Answer Candidate
Filtering based on Natural-Language Features [117.44028458220427]
We address the problem of how the Question Answering (QA) quality of a given system can be improved.
Our main contribution is an approach capable of identifying wrong answers provided by a QA system.
In particular, our approach has shown its potential while removing in many cases the majority of incorrect answers.
arXiv Detail & Related papers (2021-12-10T11:09:44Z) - On the Feasibility of Predicting Questions being Forgotten in Stack
Overflow [1.9403536652499676]
Questions on new technologies, technology features as well as technology versions come up and have to be answered as technology evolves.
At the same time, other questions cease in importance over time, finally becoming irrelevant to users.
"Forgetting" questions, which have become redundant, is an important step for keeping the Stack Overflow content concise and useful.
arXiv Detail & Related papers (2021-10-29T15:59:11Z) - Attention-based model for predicting question relatedness on Stack
Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically.
ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics.
Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z) - Few-Shot Complex Knowledge Base Question Answering via Meta
Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB)
The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types.
This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions.
We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.