Qsnail: A Questionnaire Dataset for Sequential Question Generation
- URL: http://arxiv.org/abs/2402.14272v1
- Date: Thu, 22 Feb 2024 04:14:10 GMT
- Title: Qsnail: A Questionnaire Dataset for Sequential Question Generation
- Authors: Yan Lei, Liang Pang, Yuanzhuo Wang, Huawei Shen, Xueqi Cheng
- Abstract summary: We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires.
We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents.
Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
- Score: 76.616068047362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The questionnaire is a professional research methodology used for both
qualitative and quantitative analysis of human opinions, preferences,
attitudes, and behaviors. However, designing and evaluating questionnaires
demands significant effort due to their intricate and complex structure.
Questionnaires entail a series of questions that must conform to intricate
constraints involving the questions, options, and overall structure.
Specifically, the questions should be relevant and specific to the given
research topic and intent. The options should be tailored to the questions,
ensuring they are mutually exclusive, completed, and ordered sensibly.
Moreover, the sequence of questions should follow a logical order, grouping
similar topics together. As a result, automatically generating questionnaires
presents a significant challenge and this area has received limited attention
primarily due to the scarcity of high-quality datasets. To address these
issues, we present Qsnail, the first dataset specifically constructed for the
questionnaire generation task, which comprises 13,168 human-written
questionnaires gathered from online platforms. We further conduct experiments
on Qsnail, and the results reveal that retrieval models and traditional
generative models do not fully align with the given research topic and intents.
Large language models, while more closely related to the research topic and
intents, exhibit significant limitations in terms of diversity and specificity.
Despite enhancements through the chain-of-thought prompt and finetuning,
questionnaires generated by language models still fall short of human-written
questionnaires. Therefore, questionnaire generation is challenging and needs to
be further explored. The dataset is available at:
https://github.com/LeiyanGithub/qsnail.
Related papers
- Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries [91.70689724416698]
We present NatQuest, a collection of 13,500 naturally occurring questions from three diverse sources.
Our analysis reveals a significant presence of causal questions (up to 42%) within the dataset.
arXiv Detail & Related papers (2024-05-30T17:55:28Z) - FOLLOWUPQG: Towards Information-Seeking Follow-up Question Generation [38.78216651059955]
We introduce the task of real-world information-seeking follow-up question generation (FQG)
We construct FOLLOWUPQG, a dataset of over 3K real-world (initial question, answer, follow-up question)s collected from a forum layman providing Reddit-friendly explanations for open-ended questions.
In contrast to existing datasets, questions in FOLLOWUPQG use more diverse pragmatic strategies to seek information, and they also show higher-order cognitive skills.
arXiv Detail & Related papers (2023-09-10T11:58:29Z) - What should I Ask: A Knowledge-driven Approach for Follow-up Questions
Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys.
We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge.
We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z) - Reinforcement Learning for Abstractive Question Summarization with
Question-aware Semantic Rewards [20.342580435464072]
We introduce a reinforcement learning-based framework for abstractive question summarization.
We propose two novel rewards obtained from the downstream tasks of (i) question-type identification and (ii) question-focus recognition.
These rewards ensure the generation of semantically valid questions and encourage the inclusion of key medical entities/foci in the question summary.
arXiv Detail & Related papers (2021-07-01T02:06:46Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - Challenges in Information-Seeking QA: Unanswerable Questions and
Paragraph Retrieval [46.3246135936476]
We analyze why answering information-seeking queries is more challenging and where their prevalent unanswerabilities arise.
Our controlled experiments suggest two headrooms -- paragraph selection and answerability prediction.
We manually annotate 800 unanswerable examples across six languages on what makes them challenging to answer.
arXiv Detail & Related papers (2020-10-22T17:48:17Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Stay Hungry, Stay Focused: Generating Informative and Specific Questions
in Information-Seeking Conversations [41.74162467619795]
We investigate the problem of generating informative questions in information-asymmetric conversations.
To generate pragmatic questions, we use reinforcement learning to optimize an informativeness metric.
We demonstrate that the resulting pragmatic questioner substantially improves the informativeness and specificity of questions generated over a baseline model.
arXiv Detail & Related papers (2020-04-30T00:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.