CausalQuest: Collecting Natural Causal Questions for AI Agents
- URL: http://arxiv.org/abs/2405.20318v1
- Date: Thu, 30 May 2024 17:55:28 GMT
- Title: CausalQuest: Collecting Natural Causal Questions for AI Agents
- Authors: Roberto Ceraolo, Dmitrii Kharlapenko, Amélie Reymond, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin,
- Abstract summary: CausalQuest is a dataset of 13,500 naturally occurring questions sourced from social networks, search engines, and AI assistants.
We formalize the definition of causal questions and establish a taxonomy for finer-grained classification.
We find that 42% of the questions humans ask are indeed causal, with the majority seeking to understand the causes behind given effects.
- Score: 95.34262362200695
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Humans have an innate drive to seek out causality. Whether fuelled by curiosity or specific goals, we constantly question why things happen, how they are interconnected, and many other related phenomena. To develop AI agents capable of addressing this natural human quest for causality, we urgently need a comprehensive dataset of natural causal questions. Unfortunately, existing datasets either contain only artificially-crafted questions that do not reflect real AI usage scenarios or have limited coverage of questions from specific sources. To address this gap, we present CausalQuest, a dataset of 13,500 naturally occurring questions sourced from social networks, search engines, and AI assistants. We formalize the definition of causal questions and establish a taxonomy for finer-grained classification. Through a combined effort of human annotators and large language models (LLMs), we carefully label the dataset. We find that 42% of the questions humans ask are indeed causal, with the majority seeking to understand the causes behind given effects. Using this dataset, we train efficient classifiers (up to 2.85B parameters) for the binary task of identifying causal questions, achieving high performance with F1 scores of up to 0.877. We conclude with a rich set of future research directions that can build upon our data and models.
Related papers
- CELLO: Causal Evaluation of Large Vision-Language Models [9.928321287432365]
Causal reasoning is fundamental to human intelligence and crucial for effective decision-making in real-world environments.
We introduce a fine-grained and unified definition of causality involving interactions between humans and objects.
We construct a novel dataset, CELLO, consisting of 14,094 causal questions across all four levels of causality.
arXiv Detail & Related papers (2024-06-27T12:34:52Z) - Causal Question Answering with Reinforcement Learning [0.3499042782396683]
Causal questions inquire about causal relationships between different events or phenomena.
In this paper, we aim to answer causal questions with a causality graph.
We introduce an Actor-Critic-based agent which learns to search through the graph to answer causal questions.
arXiv Detail & Related papers (2023-11-05T20:33:18Z) - CLEVRER-Humans: Describing Physical and Causal Events the Human Way [55.44915246065028]
We present the CLEVRER-Humans benchmark, a video dataset for causal judgment of physical events with human labels.
We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models.
arXiv Detail & Related papers (2023-10-05T16:09:48Z) - FOLLOWUPQG: Towards Information-Seeking Follow-up Question Generation [38.78216651059955]
We introduce the task of real-world information-seeking follow-up question generation (FQG)
We construct FOLLOWUPQG, a dataset of over 3K real-world (initial question, answer, follow-up question)s collected from a forum layman providing Reddit-friendly explanations for open-ended questions.
In contrast to existing datasets, questions in FOLLOWUPQG use more diverse pragmatic strategies to seek information, and they also show higher-order cognitive skills.
arXiv Detail & Related papers (2023-09-10T11:58:29Z) - Active Bayesian Causal Inference [72.70593653185078]
We propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning.
ABCI jointly infers a posterior over causal models and queries of interest.
We show that our approach is more data-efficient than several baselines that only focus on learning the full causal graph.
arXiv Detail & Related papers (2022-06-04T22:38:57Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task.
Learning an effective CQA model requires large amounts of human-annotated data.
We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.