Answer ranking in Community Question Answering: a deep learning approach
- URL: http://arxiv.org/abs/2212.01218v1
- Date: Sun, 16 Oct 2022 18:47:41 GMT
- Title: Answer ranking in Community Question Answering: a deep learning approach
- Authors: Lucas Valentin
- Abstract summary: This work tries to advance the state of the art on answer ranking for community Question Answering by proceeding with a deep learning approach.
We created a large data set of questions and answers posted to the Stack Overflow website.
We leveraged the natural language processing capabilities of dense embeddings and LSTM networks to produce a prediction for the accepted answer attribute.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Community Question Answering is the field of computational linguistics that
deals with problems derived from the questions and answers posted to websites
such as Quora or Stack Overflow. Among some of these problems we find the issue
of ranking the multiple answers posted in reply to each question by how
informative they are in the attempt to solve the original question. This work
tries to advance the state of the art on answer ranking for community Question
Answering by proceeding with a deep learning approach. We started off by
creating a large data set of questions and answers posted to the Stack Overflow
website.
We then leveraged the natural language processing capabilities of dense
embeddings and LSTM networks to produce a prediction for the accepted answer
attribute, and present the answers in a ranked form ordered by how likely they
are to be marked as accepted by the question asker. We also produced a set of
numerical features to assist with the answer ranking task. These numerical
features were either extracted from metadata found in the Stack Overflow posts
or derived from the questions and answers texts. We compared the performance of
our deep learning models against a set of forest and boosted trees ensemble
methods and found that our models could not improve the best baseline results.
We speculate that this lack of performance improvement versus the baseline
models may be caused by the large number of out of vocabulary words present in
the programming code snippets found in the questions and answers text. We
conclude that while a deep learning approach may be helpful in answer ranking
problems new methods should be developed to assist with the large number of out
of vocabulary words present in the programming code snippets
Related papers
- Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Best-Answer Prediction in Q&A Sites Using User Information [2.982218441172364]
Community Question Answering (CQA) sites have spread and multiplied significantly in recent years.
One practical way of finding such answers is automatically predicting the best candidate given existing answers and comments.
We address this limitation using a novel method for predicting the best answers using the questioner's background information and other features.
arXiv Detail & Related papers (2022-12-15T02:28:52Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - Attention-based model for predicting question relatedness on Stack
Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically.
ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics.
Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z) - Features that Predict the Acceptability of Java and JavaScript Answers
on Stack Overflow [5.332217496693262]
We studied the Stack Overflow dataset by analyzing questions and answers for the two most popular tags (Java and JavaScript)
Our findings reveal that the length of code in answers, reputation of users, similarity of the text between questions and answers, and the time lag between questions and answers have the highest predictive power for differentiating accepted and unaccepted answers.
arXiv Detail & Related papers (2021-01-08T03:09:38Z) - Brain-inspired Search Engine Assistant based on Knowledge Graph [53.89429854626489]
DeveloperBot is a brain-inspired search engine assistant named on knowledge graph.
It constructs a multi-layer query graph by splitting a complex multi-constraint query into several ordered constraints.
It then models the constraint reasoning process as subgraph search process inspired by the spreading activation model of cognitive science.
arXiv Detail & Related papers (2020-12-25T06:36:11Z) - Few-Shot Complex Knowledge Base Question Answering via Meta
Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB)
The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types.
This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z) - Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via
Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision.
Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z) - Improving Quality of a Post's Set of Answers in Stack Overflow [2.0625936401496237]
A large number of low-quality posts on Stack Overflow require improvement.
We propose an approach to automate the identification process of such posts and boost their set of answers.
arXiv Detail & Related papers (2020-05-30T19:40:19Z) - Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions.
We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.