Related papers: Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries

Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries

URL: http://arxiv.org/abs/2405.20318v2
Date: Thu, 24 Oct 2024 09:21:38 GMT
Title: Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries
Authors: Roberto Ceraolo, Dmitrii Kharlapenko, Ahmad Khan, Amélie Reymond, Rada Mihalcea, Bernhard Schölkopf, Mrinmaya Sachan, Zhijing Jin,
Abstract summary: We present NatQuest, a collection of 13,500 naturally occurring questions from three diverse sources. Our analysis reveals a significant presence of causal questions (up to 42%) within the dataset.
Score: 91.70689724416698
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The recent development of Large Language Models (LLMs) has changed our role in interacting with them. Instead of primarily testing these models with questions we already know the answers to, we now use them to explore questions where the answers are unknown to us. This shift, which hasn't been fully addressed in existing datasets, highlights the growing need to understand naturally occurring human questions - that are more complex, open-ended, and reflective of real-world needs. To this end, we present NatQuest, a collection of 13,500 naturally occurring questions from three diverse sources: human-to-search-engine queries, human-to-human interactions, and human-to-LLM conversations. Our comprehensive collection enables a rich understanding of human curiosity across various domains and contexts. Our analysis reveals a significant presence of causal questions (up to 42%) within the dataset, for which we develop an iterative prompt improvement framework to identify all causal queries, and examine their unique linguistic properties, cognitive complexity, and source distribution. We also lay the groundwork to explore LLM performance on these questions and provide six efficient classification models to identify causal questions at scale for future work.

Related papers

Exploring Human-LLM Conversations: Mental Models and the Originator of Toxicity [1.4003044924094596]
This study explores real-world human interactions with large language models (LLMs) in diverse, unconstrained settings. Our findings show that although LLMs are rightfully accused of providing toxic content, it is mostly demanded or at least provoked by humans who actively seek such content.
arXiv Detail & Related papers (2024-07-08T14:20:05Z)
Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires. We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents. Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z)
A Comparative and Experimental Study on Automatic Question Answering Systems and its Robustness against Word Jumbling [0.49157446832511503]
Question answer generation is highly relevant because a frequently asked questions (FAQ) list can only have a finite amount of questions. A model which can perform question answer generation could be able to answer completely new questions that are within the scope of the data. In commercial applications, it can be used to increase customer satisfaction and ease of usage. However a lot of data is generated by humans so it is susceptible to human error and this can adversely affect the model's performance.
arXiv Detail & Related papers (2023-11-27T03:17:09Z)
FOLLOWUPQG: Towards Information-Seeking Follow-up Question Generation [38.78216651059955]
We introduce the task of real-world information-seeking follow-up question generation (FQG) We construct FOLLOWUPQG, a dataset of over 3K real-world (initial question, answer, follow-up question)s collected from a forum layman providing Reddit-friendly explanations for open-ended questions. In contrast to existing datasets, questions in FOLLOWUPQG use more diverse pragmatic strategies to seek information, and they also show higher-order cognitive skills.
arXiv Detail & Related papers (2023-09-10T11:58:29Z)
Overinformative Question Answering by Humans and Machines [26.31070412632125]
We show that overinformativeness in human answering is driven by considerations of relevance to the questioner's goals. We show that GPT-3 is highly sensitive to the form of the prompt and only human-like answer patterns when guided by an example and cognitively-motivated explanation.
arXiv Detail & Related papers (2023-05-11T21:41:41Z)
WebCPM: Interactive Web Search for Chinese Long-form Question Answering [104.676752359777]
Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses. We introduce WebCPM, the first Chinese LFQA dataset. We collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions.
arXiv Detail & Related papers (2023-05-11T14:47:29Z)
Zero-shot Clarifying Question Generation for Conversational Search [25.514678546942754]
We propose a constrained clarifying question generation system which uses both question templates and query facets to guide the effective and precise question generation. Experiment results show that our method outperforms existing state-of-the-art zero-shot baselines by a large margin.
arXiv Detail & Related papers (2023-01-30T04:43:02Z)
JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs. Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge. Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z)
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z)
Evaluating Mixed-initiative Conversational Search Systems via User Simulation [9.066817876491053]
We propose a conversational User Simulator, called USi, for automatic evaluation of such search systems. We show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers.
arXiv Detail & Related papers (2022-04-17T16:27:33Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)
Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document. We show that readers engage in a series of pragmatic strategies to seek information. We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.