Related papers: Answer ranking in Community Question Answering: a deep learning approach

Answer ranking in Community Question Answering: a deep learning approach

URL: http://arxiv.org/abs/2212.01218v1
Date: Sun, 16 Oct 2022 18:47:41 GMT
Title: Answer ranking in Community Question Answering: a deep learning approach
Authors: Lucas Valentin
Abstract summary: This work tries to advance the state of the art on answer ranking for community Question Answering by proceeding with a deep learning approach. We created a large data set of questions and answers posted to the Stack Overflow website. We leveraged the natural language processing capabilities of dense embeddings and LSTM networks to produce a prediction for the accepted answer attribute.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Community Question Answering is the field of computational linguistics that deals with problems derived from the questions and answers posted to websites such as Quora or Stack Overflow. Among some of these problems we find the issue of ranking the multiple answers posted in reply to each question by how informative they are in the attempt to solve the original question. This work tries to advance the state of the art on answer ranking for community Question Answering by proceeding with a deep learning approach. We started off by creating a large data set of questions and answers posted to the Stack Overflow website. We then leveraged the natural language processing capabilities of dense embeddings and LSTM networks to produce a prediction for the accepted answer attribute, and present the answers in a ranked form ordered by how likely they are to be marked as accepted by the question asker. We also produced a set of numerical features to assist with the answer ranking task. These numerical features were either extracted from metadata found in the Stack Overflow posts or derived from the questions and answers texts. We compared the performance of our deep learning models against a set of forest and boosted trees ensemble methods and found that our models could not improve the best baseline results. We speculate that this lack of performance improvement versus the baseline models may be caused by the large number of out of vocabulary words present in the programming code snippets found in the questions and answers text. We conclude that while a deep learning approach may be helpful in answer ranking problems new methods should be developed to assist with the large number of out of vocabulary words present in the programming code snippets

Related papers

Self-Questioning Language Models [51.75087358141567]
We propose an asymmetric self-play framework where a proposer is given the topic and generates a question for a solver.<n>Both the proposer and solver are trained via reinforcement learning.<n>We study this asymmetric self-play framework on three benchmarks: three-digit multiplication, algebra problems from the OMEGA benchmark, and programming problems from Codeforces.
arXiv Detail & Related papers (2025-08-05T17:51:33Z)
Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z)
Best-Answer Prediction in Q&A Sites Using User Information [2.982218441172364]
Community Question Answering (CQA) sites have spread and multiplied significantly in recent years. One practical way of finding such answers is automatically predicting the best candidate given existing answers and comments. We address this limitation using a novel method for predicting the best answers using the questioner's background information and other features.
arXiv Detail & Related papers (2022-12-15T02:28:52Z)
GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z)
Attention-based model for predicting question relatedness on Stack Overflow [0.0]
We propose an Attention-based Sentence pair Interaction Model (ASIM) to predict the relatedness between questions on Stack Overflow automatically. ASIM has made significant improvement over the baseline approaches in Precision, Recall, and Micro-F1 evaluation metrics. Our model also performs well in the duplicate question detection task of Ask Ubuntu.
arXiv Detail & Related papers (2021-03-19T12:18:03Z)
Features that Predict the Acceptability of Java and JavaScript Answers on Stack Overflow [5.332217496693262]
We studied the Stack Overflow dataset by analyzing questions and answers for the two most popular tags (Java and JavaScript) Our findings reveal that the length of code in answers, reputation of users, similarity of the text between questions and answers, and the time lag between questions and answers have the highest predictive power for differentiating accepted and unaccepted answers.
arXiv Detail & Related papers (2021-01-08T03:09:38Z)
Brain-inspired Search Engine Assistant based on Knowledge Graph [53.89429854626489]
DeveloperBot is a brain-inspired search engine assistant named on knowledge graph. It constructs a multi-layer query graph by splitting a complex multi-constraint query into several ordered constraints. It then models the constraint reasoning process as subgraph search process inspired by the spreading activation model of cognitive science.
arXiv Detail & Related papers (2020-12-25T06:36:11Z)
Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB) The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types. This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z)
Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision. Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z)
Improving Quality of a Post's Set of Answers in Stack Overflow [2.0625936401496237]
A large number of low-quality posts on Stack Overflow require improvement. We propose an approach to automate the identification process of such posts and boost their set of answers.
arXiv Detail & Related papers (2020-05-30T19:40:19Z)
Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions. We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.