PerCQA: Persian Community Question Answering Dataset
- URL: http://arxiv.org/abs/2112.13238v1
- Date: Sat, 25 Dec 2021 14:06:41 GMT
- Title: PerCQA: Persian Community Question Answering Dataset
- Authors: Naghme Jamali, Yadollah Yaghoobzadeh, Hesham Faili
- Abstract summary: Community Question Answering (CQA) forums provide answers for many real-life questions.
We present PerCQA, the first Persian dataset for CQA.
This dataset contains the questions and answers crawled from the most well-known Persian forum.
- Score: 2.503043323723241
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Community Question Answering (CQA) forums provide answers for many real-life
questions. Thanks to the large size, these forums are very popular among
machine learning researchers. Automatic answer selection, answer ranking,
question retrieval, expert finding, and fact-checking are example learning
tasks performed using CQA data. In this paper, we present PerCQA, the first
Persian dataset for CQA. This dataset contains the questions and answers
crawled from the most well-known Persian forum. After data acquisition, we
provide rigorous annotation guidelines in an iterative process, and then the
annotation of question-answer pairs in SemEvalCQA format. PerCQA contains 989
questions and 21,915 annotated answers. We make PerCQA publicly available to
encourage more research in Persian CQA. We also build strong benchmarks for the
task of answer selection in PerCQA by using mono- and multi-lingual pre-trained
language models
Related papers
- Feature Engineering in Learning-to-Rank for Community Question Answering
Task [2.5091819952713057]
Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions.
Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data.
These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to the query of the user.
arXiv Detail & Related papers (2023-09-14T11:18:26Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - IfQA: A Dataset for Open-domain Question Answering under Counterfactual
Presuppositions [54.23087908182134]
We introduce the first large-scale counterfactual open-domain question-answering (QA) benchmarks, named IfQA.
The IfQA dataset contains over 3,800 questions that were annotated by crowdworkers on relevant Wikipedia passages.
The unique challenges posed by the IfQA benchmark will push open-domain QA research on both retrieval and counterfactual reasoning fronts.
arXiv Detail & Related papers (2023-05-23T12:43:19Z) - Summarizing Community-based Question-Answer Pairs [5.680726650578754]
We propose the novel CQA summarization task that aims to create a concise summary from CQA pairs.
Our data and code are publicly available.
arXiv Detail & Related papers (2022-11-17T21:09:41Z) - Can Question Rewriting Help Conversational Question Answering? [13.484873786389471]
Question rewriting (QR) is a subtask of conversational question answering (CQA)
We investigate a reinforcement learning approach that integrates QR and CQA tasks and does not require corresponding QR datasets for targeted CQA.
We find, however, that the RL method is on par with the end-to-end baseline.
arXiv Detail & Related papers (2022-04-13T08:16:03Z) - ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers.
We call this dataset ConditionalQA.
We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - ParaQA: A Question Answering Dataset with Paraphrase Responses for
Single-Turn Conversation [5.087932295628364]
ParaQA is a dataset with multiple paraphrased responses for single-turn conversation over knowledge graphs (KG)
The dataset was created using a semi-automated framework for generating diverse paraphrasing of the answers using techniques such as back-translation.
arXiv Detail & Related papers (2021-03-13T18:53:07Z) - Few-Shot Complex Knowledge Base Question Answering via Meta
Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB)
The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types.
This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z) - KQA Pro: A Dataset with Explicit Compositional Programs for Complex
Question Answering over Knowledge Base [67.87878113432723]
We introduce KQA Pro, a dataset for Complex KBQA including 120K diverse natural language questions.
For each question, we provide the corresponding KoPL program and SPARQL query, so that KQA Pro serves for both KBQA and semantic parsing tasks.
arXiv Detail & Related papers (2020-07-08T03:28:04Z) - CQ-VQA: Visual Question Answering on Categorized Questions [3.0013352260516744]
This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end model to solve the task of visual question answering (VQA)
The first level of CQ-VQA, referred to as question categorizer (QC), classifies questions to reduce the potential answer search space.
The second level, referred to as answer predictor (AP), comprises of a set of distinct classifiers corresponding to each question category.
arXiv Detail & Related papers (2020-02-17T06:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.