Automated Hypothesis Validation with Agentic Sequential Falsifications
- URL: http://arxiv.org/abs/2502.09858v1
- Date: Fri, 14 Feb 2025 01:46:00 GMT
- Title: Automated Hypothesis Validation with Agentic Sequential Falsifications
- Authors: Kexin Huang, Ying Jin, Ryan Li, Michael Y. Li, Emmanuel Candès, Jure Leskovec,
- Abstract summary: Many real-world hypotheses are abstract, high-level statements that are difficult to validate directly.
Here we propose Popper, an agentic framework for rigorous automated validation of free-form hypotheses.
- Score: 45.572893831500686
- License:
- Abstract: Hypotheses are central to information acquisition, decision-making, and discovery. However, many real-world hypotheses are abstract, high-level statements that are difficult to validate directly. This challenge is further intensified by the rise of hypothesis generation from Large Language Models (LLMs), which are prone to hallucination and produce hypotheses in volumes that make manual validation impractical. Here we propose Popper, an agentic framework for rigorous automated validation of free-form hypotheses. Guided by Karl Popper's principle of falsification, Popper validates a hypothesis using LLM agents that design and execute falsification experiments targeting its measurable implications. A novel sequential testing framework ensures strict Type-I error control while actively gathering evidence from diverse observations, whether drawn from existing data or newly conducted procedures. We demonstrate Popper on six domains including biology, economics, and sociology. Popper delivers robust error control, high power, and scalability. Furthermore, compared to human scientists, Popper achieved comparable performance in validating complex biological hypotheses while reducing time by 10 folds, providing a scalable, rigorous solution for hypothesis validation.
Related papers
- Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment.
First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population.
We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - Resolving Multiple-Dynamic Model Uncertainty in Hypothesis-Driven Belief-MDPs [4.956709222278243]
We present a hypothesis-driven belief MDP that enables reasoning over multiple hypotheses.
We also present a new belief MDP that balances the goals of determining the (most likely) correct hypothesis and performing well in the underlying POMDP.
arXiv Detail & Related papers (2024-11-21T18:36:19Z) - FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees [41.78390564658645]
Large Language Models (LLMs) to generate hallucinations and non-factual content undermines their reliability in high-stakes domains.
We introduce FactTest, a novel framework that statistically assesses whether a LLM can confidently provide correct answers to given questions.
We show that FactTest effectively detects hallucinations and improves the model's ability to abstain from answering unknown questions, leading to an over 40% accuracy improvement.
arXiv Detail & Related papers (2024-11-04T20:53:04Z) - Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation
of Prediction Rationale [53.152460508207184]
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data.
This paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis.
To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning.
arXiv Detail & Related papers (2024-02-02T05:53:22Z) - Max-Rank: Efficient Multiple Testing for Conformal Prediction [43.56898111853698]
Multiple hypothesis testing (MHT) commonly arises in various scientific fields, from genomics to psychology, where testing many hypotheses simultaneously increases the risk of Type-I errors.
We propose a novel correction named $textttmax-rank$ that leverages these dependencies, whilst ensuring that the joint Type-I error rate is efficiently controlled.
arXiv Detail & Related papers (2023-11-17T22:44:22Z) - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [50.40483334131271]
This work proposes the first dataset for social science academic hypotheses discovery.
Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity.
A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
arXiv Detail & Related papers (2023-09-06T05:19:41Z) - Generating Scientific Claims for Zero-Shot Scientific Fact Checking [54.62086027306609]
Automated scientific fact checking is difficult due to the complexity of scientific language and a lack of significant amounts of training data.
We propose scientific claim generation, the task of generating one or more atomic and verifiable claims from scientific sentences.
We also demonstrate its usefulness in zero-shot fact checking for biomedical claims.
arXiv Detail & Related papers (2022-03-24T11:29:20Z) - Learning Logic Programs From Noisy Failures [0.0]
We introduce the relaxed learning from failures approach to ILP, a noise handling modification of the previously introduced learning from failures (LFF) approach.
We additionally introduce the novel Noisy Popper ILP system which implements this relaxed approach and is a modification of the existing Popper system.
arXiv Detail & Related papers (2021-12-28T16:48:00Z) - Empirically Verifying Hypotheses Using Reinforcement Learning [58.09414653169534]
This paper formulates hypothesis verification as an RL problem.
We aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false.
arXiv Detail & Related papers (2020-06-29T01:01:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.