On the Feasibility of Predicting Questions being Forgotten in Stack
Overflow
- URL: http://arxiv.org/abs/2110.15789v1
- Date: Fri, 29 Oct 2021 15:59:11 GMT
- Title: On the Feasibility of Predicting Questions being Forgotten in Stack
Overflow
- Authors: Thi Huyen Nguyen, Tu Nguyen, Tuan-Anh Hoang, Claudia Nieder\'ee
- Abstract summary: Questions on new technologies, technology features as well as technology versions come up and have to be answered as technology evolves.
At the same time, other questions cease in importance over time, finally becoming irrelevant to users.
"Forgetting" questions, which have become redundant, is an important step for keeping the Stack Overflow content concise and useful.
- Score: 1.9403536652499676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For their attractiveness, comprehensiveness and dynamic coverage of relevant
topics, community-based question answering sites such as Stack Overflow heavily
rely on the engagement of their communities: Questions on new technologies,
technology features as well as technology versions come up and have to be
answered as technology evolves (and as community members gather experience with
it). At the same time, other questions cease in importance over time, finally
becoming irrelevant to users. Beyond filtering low-quality questions,
"forgetting" questions, which have become redundant, is an important step for
keeping the Stack Overflow content concise and useful. In this work, we study
this managed forgetting task for Stack Overflow. Our work is based on data from
more than a decade (2008 - 2019) - covering 18.1M questions, that are made
publicly available by the site itself. For establishing a deeper understanding,
we first analyze and characterize the set of questions about to be forgotten,
i.e., questions that get a considerable number of views in the current period
but become unattractive in the near future. Subsequently, we examine the
capability of a wide range of features in predicting such forgotten questions
in different categories. We find some categories in which those questions are
more predictable. We also discover that the text-based features are
surprisingly not helpful in this prediction task, while the meta information is
much more predictive.
Related papers
- How to Engage Your Readers? Generating Guiding Questions to Promote Active Reading [60.19226384241482]
We introduce GuidingQ, a dataset of 10K in-text questions from textbooks and scientific articles.
We explore various approaches to generate such questions using language models.
We conduct a human study to understand the implication of such questions on reading comprehension.
arXiv Detail & Related papers (2024-07-19T13:42:56Z) - Which questions should I answer? Salience Prediction of Inquisitive Questions [118.097974193544]
We show that highly salient questions are empirically more likely to be answered in the same article.
We further validate our findings by showing that answering salient questions is an indicator of summarization quality in news.
arXiv Detail & Related papers (2024-04-16T21:33:05Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Software Engineers' Questions and Answers on Stack Exchange [0.0]
We analyze the questions and answers on the Software Engineering Stack Exchange site that encompasses a broader set of areas.
We found that the asked questions are most frequently related to database systems, quality assurance, and agile software development.
The most attractive topics were career and teamwork problems, and the least attractive ones were network programming and software modeling.
arXiv Detail & Related papers (2023-06-20T13:39:49Z) - Best-Answer Prediction in Q&A Sites Using User Information [2.982218441172364]
Community Question Answering (CQA) sites have spread and multiplied significantly in recent years.
One practical way of finding such answers is automatically predicting the best candidate given existing answers and comments.
We address this limitation using a novel method for predicting the best answers using the questioner's background information and other features.
arXiv Detail & Related papers (2022-12-15T02:28:52Z) - Mining Duplicate Questions of Stack Overflow [5.924018537171331]
We propose two neural network based architectures for duplicate question detection on Stack Overflow.
We also propose explicitly modeling the code present in questions to achieve results that surpass the state of the art.
arXiv Detail & Related papers (2022-10-04T14:34:59Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - Attention-based model for predicting question relatedness on Stack
Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically.
ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics.
Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.