MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback
- URL: http://arxiv.org/abs/2505.17873v3
- Date: Sat, 25 Oct 2025 14:00:54 GMT
- Title: MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback
- Authors: Wanhao Liu, Zonglin Yang, Jue Wang, Lidong Bing, Di Zhang, Dongzhan Zhou, Yuqiang Li, Houqiang Li, Erik Cambria, Wanli Ouyang,
- Abstract summary: We introduce experiment-guided ranking, which prioritizes hypotheses based on feedback from prior tests.<n>We frame experiment-guided ranking as a sequential decision-making problem.<n>Our approach significantly outperforms pre-experiment baselines and strong ablations.
- Score: 136.27567671480156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hypothesis ranking is vital for automated scientific discovery, especially in cost-intensive, throughput-limited natural science domains. Current methods focus on pre-experiment ranking, relying solely on language model reasoning without empirical feedback. We introduce experiment-guided ranking, which prioritizes hypotheses based on feedback from prior tests. Due to the impracticality of real experiments, we propose a simulator grounded in domain-specific concepts that models hypothesis performance as a function of similarity to a hidden ground truth, perturbed by noise. Validated against 124 hypotheses with experimentally reported outcomes, the simulator approximates real results with consistent trend alignment. Although deviations exist, they mimic wet-lab noise, promoting more robust ranking strategies. We frame experiment-guided ranking as a sequential decision-making problem and propose an in-context reinforcement learning (ICRL) framework. Our LLM-based policy decomposes hypotheses into functional elements, clusters them by mechanistic roles, and prioritizes recombinations based on feedback. Experiments show our approach significantly outperforms pre-experiment baselines and strong ablations. Our toolkit, comprising the simulator and ICRL framework, enables systematic research on experiment-guided ranking, with the policy serving as a strong proof of concept.
Related papers
- HEAL: A Hypothesis-Based Preference-Aware Analysis Framework [32.45006553398745]
This paper presents a textbfHypothesis-based PrtextbfEference-aware textbfAnatextbfLysis Framework (HEAL)<n>It formulates preference alignment as a re-ranking process within hypothesis spaces.<n>The framework incorporates two complementary metrics: ranking accuracy for evaluating ordinal consistency and preference strength correlation for assessing continuous alignment.
arXiv Detail & Related papers (2025-08-27T14:30:08Z) - Simulation-Based Inference for Adaptive Experiments [38.841210420855276]
Multi-arm bandit experimental designs are increasingly being adopted over standard randomized trials.<n>We propose a simulation-based approach for conducting hypothesis tests and constructing confidence intervals for arm specific means.<n>Our results show that our approach achieves the desired coverage while reducing confidence interval widths by up to 50%, with drastic improvements for arms not targeted by the design.
arXiv Detail & Related papers (2025-06-03T13:46:59Z) - MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search [93.64235254640967]
Large language models (LLMs) have shown promise in automating scientific hypothesis generation.<n>We define the novel task of fine-grained scientific hypothesis discovery.<n>We propose a hierarchical search method that incrementally proposes and integrates details into the hypothesis.
arXiv Detail & Related papers (2025-05-25T16:13:46Z) - Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on Prediction-Powered Causal Inferences (PPCI)<n> PPCI estimates the treatment effect in a target experiment with unlabeled factual outcomes, retrievable zero-shot from a pre-trained model.<n>We validate our method on synthetic and real-world scientific data, offering solutions to instances not solvable by vanilla Empirical Risk Minimization.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery [24.630117520005257]
We introduce BoxingGym, a benchmark with 10 environments for evaluating experimental design and model discovery.<n>We compute the expected information gain (EIG), an information-theoretic quantity which measures how much an experiment reduces uncertainty about the parameters of a generative model.<n>We find that current LLMs, such as GPT-4o, struggle with both experimental design and model discovery.
arXiv Detail & Related papers (2025-01-02T21:15:57Z) - Simulating Field Experiments with Large Language Models [0.6144680854063939]
This paper pioneers the utilization of large language models (LLMs) for simulating field experiments.
By introducing two novel prompting strategies, observer and participant modes, we demonstrate the ability of LLMs to both predict outcomes and replicate participant responses within complex field settings.
Our findings indicate a promising alignment with actual experimental results in certain scenarios, achieving a stimulation accuracy of 66% in observer mode.
arXiv Detail & Related papers (2024-08-19T03:41:43Z) - LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations.
We introduce Scientific Generative Agent (SGA), a bilevel optimization framework.
We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z) - The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective [0.27624021966289597]
Large Language Models (LLMs) have shown impressive potential to simulate human behavior.<n>We identify a fundamental challenge in using them to simulate experiments.<n>When LLM-simulated subjects are blind to the experimental design, variations in treatment systematically affect unspecified variables.
arXiv Detail & Related papers (2023-12-24T16:32:35Z) - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [50.40483334131271]
This work proposes the first dataset for social science academic hypotheses discovery.
Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity.
A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
arXiv Detail & Related papers (2023-09-06T05:19:41Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [59.29868677652324]
We propose a double machine learning approach to combine experimental and observational studies.
Our framework tests for violations of external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z) - Optimal tests following sequential experiments [0.0]
The purpose of this paper is to aid in the development of optimal tests for sequential experiments by analyzing their properties.
Our key finding is that the power function of any test can be matched by a test in a limit experiment.
This result has important implications, including a powerful sufficiency result.
arXiv Detail & Related papers (2023-04-30T06:09:49Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Optimal Learning for Sequential Decisions in Laboratory Experimentation [0.0]
This tutorial is aimed to provide experimental scientists with a foundation in the science of making decisions.
We introduce the concept of a learning policy, and review the major categories of policies.
We then introduce a policy, known as the knowledge gradient, that maximizes the value of information from each experiment.
arXiv Detail & Related papers (2020-04-11T14:53:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.