Related papers: MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

URL: http://arxiv.org/abs/2505.17873v2
Date: Fri, 30 May 2025 11:59:51 GMT
Title: MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback
Authors: Wanhao Liu, Zonglin Yang, Jue Wang, Lidong Bing, Di Zhang, Dongzhan Zhou, Yuqiang Li, Houqiang Li, Erik Cambria, Wanli Ouyang,
Abstract summary: We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones.<n>We propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis.<n>We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator.
Score: 128.2992631982687
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hypothesis ranking is a crucial component of automated scientific discovery, particularly in natural sciences where wet-lab experiments are costly and throughput-limited. Existing approaches focus on pre-experiment ranking, relying solely on large language model's internal reasoning without incorporating empirical outcomes from experiments. We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones. However, developing such strategies is challenging due to the impracticality of repeatedly conducting real experiments in natural science domains. To address this, we propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis, perturbed by noise. We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator. Building on this simulator, we develop a pseudo experiment-guided ranking method that clusters hypotheses by shared functional characteristics and prioritizes candidates based on insights derived from simulated experimental feedback. Experiments show that our method outperforms pre-experiment baselines and strong ablations.

Related papers

MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search [93.64235254640967]
Large language models (LLMs) have shown promise in automating scientific hypothesis generation.<n>We define the novel task of fine-grained scientific hypothesis discovery.<n>We propose a hierarchical search method that incrementally proposes and integrates details into the hypothesis.
arXiv Detail & Related papers (2025-05-25T16:13:46Z)
Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on Prediction-Powered Causal Inferences (PPCI)<n> PPCI estimates the treatment effect in a target experiment with unlabeled factual outcomes, retrievable zero-shot from a pre-trained model.<n>We validate our method on synthetic and real-world scientific data, offering solutions to instances not solvable by vanilla Empirical Risk Minimization.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery [24.630117520005257]
We introduce BoxingGym, a benchmark with 10 environments for evaluating experimental design and model discovery.<n>We compute the expected information gain (EIG), an information-theoretic quantity which measures how much an experiment reduces uncertainty about the parameters of a generative model.<n>We find that current LLMs, such as GPT-4o, struggle with both experimental design and model discovery.
arXiv Detail & Related papers (2025-01-02T21:15:57Z)
Simulating Field Experiments with Large Language Models [0.6144680854063939]
This paper pioneers the utilization of large language models (LLMs) for simulating field experiments. By introducing two novel prompting strategies, observer and participant modes, we demonstrate the ability of LLMs to both predict outcomes and replicate participant responses within complex field settings. Our findings indicate a promising alignment with actual experimental results in certain scenarios, achieving a stimulation accuracy of 66% in observer mode.
arXiv Detail & Related papers (2024-08-19T03:41:43Z)
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework. We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z)
The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective [0.27624021966289597]
Large Language Models (LLMs) have shown impressive potential to simulate human behavior.<n>We identify a fundamental challenge in using them to simulate experiments.<n>When LLM-simulated subjects are blind to the experimental design, variations in treatment systematically affect unspecified variables.
arXiv Detail & Related papers (2023-12-24T16:32:35Z)
Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [50.40483334131271]
This work proposes the first dataset for social science academic hypotheses discovery. Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity. A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
arXiv Detail & Related papers (2023-09-06T05:19:41Z)
A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies. Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z)
Optimal tests following sequential experiments [0.0]
The purpose of this paper is to aid in the development of optimal tests for sequential experiments by analyzing their properties. Our key finding is that the power function of any test can be matched by a test in a limit experiment. This result has important implications, including a powerful sufficiency result.
arXiv Detail & Related papers (2023-04-30T06:09:49Z)
Optimal Learning for Sequential Decisions in Laboratory Experimentation [0.0]
This tutorial is aimed to provide experimental scientists with a foundation in the science of making decisions. We introduce the concept of a learning policy, and review the major categories of policies. We then introduce a policy, known as the knowledge gradient, that maximizes the value of information from each experiment.
arXiv Detail & Related papers (2020-04-11T14:53:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.