Principle-Evolvable Scientific Discovery via Uncertainty Minimization
- URL: http://arxiv.org/abs/2602.06448v1
- Date: Fri, 06 Feb 2026 07:19:27 GMT
- Title: Principle-Evolvable Scientific Discovery via Uncertainty Minimization
- Authors: Yingming Pu, Tao Lin, Hongyu Chen,
- Abstract summary: We present PiEvo, a principle-evolvable framework that treats scientific discovery as Bayesian optimization over an expanding principle space.<n>PiEvo achieves an average solution quality of up to 90.81%93.15%, representing a 29.7%31.1% improvement over the state-of-the-art.
- Score: 9.216546947535244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Model (LLM)-based scientific agents have accelerated scientific discovery, yet they often suffer from significant inefficiencies due to adherence to fixed initial priors. Existing approaches predominantly operate within a static hypothesis space, which restricts the discovery of novel phenomena, resulting in computational waste when baseline theories fail. To address this, we propose shifting the focus from searching hypotheses to evolving the underlying scientific principles. We present PiEvo, a principle-evolvable framework that treats scientific discovery as Bayesian optimization over an expanding principle space. By integrating Information-Directed Hypothesis Selection via Gaussian Process and an anomaly-driven augmentation mechanism, PiEvo enables agents to autonomously refine their theoretical worldview. Evaluation across four benchmarks demonstrates that PiEvo (1) achieves an average solution quality of up to 90.81%~93.15%, representing a 29.7%~31.1% improvement over the state-of-the-art, (2) attains an 83.3% speedup in convergence step via significantly reduced sample complexity by optimizing the compact principle space, and (3) maintains robust performance across diverse scientific domains and LLM backbones.
Related papers
- P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads [91.05736019384489]
We introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning.<n>Our flagship P1-VL-235B-A22B becomes the first open-source Vision-Language Model to secure 12 gold medals and achieves the state-of-the-art performance in the open-source models.
arXiv Detail & Related papers (2026-02-10T06:28:08Z) - Accelerating Social Science Research via Agentic Hypothesization and Experimentation [33.55093074029515]
EXPERIGEN is a framework that operationalizes end-to-end discovery through a Bayesian optimization inspired two-phase search.<n>It consistently discovers 2-4x more statistically significant hypotheses that are 7-17 percent more predictive than prior approaches.<n>We conduct the first A/B test of LLM-generated hypotheses, observing statistically significant results with p less than 1e-6 and a large effect size of 344 percent.
arXiv Detail & Related papers (2026-02-08T14:20:56Z) - Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows [203.3527268311731]
We present an operational SGI definition grounded in the Practical Inquiry Model (PIM)<n>We operationalize it via four scientist-aligned tasks: deep research, idea generation, dry/wet experiments, and experimental reasoning.<n>Our PIM-grounded definition, workflow-centric benchmark, and empirical insights establish a foundation for AI systems that genuinely participate in scientific discovery.
arXiv Detail & Related papers (2025-12-18T12:44:36Z) - NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents [65.85967483058705]
Large language models are emerging as powerful tools for scientific law discovery.<n>Existing benchmarks for this task suffer from a fundamental methodological trilemma.<n>We introduce NewtonBench, a benchmark comprising 324 scientific law discovery tasks across 12 physics domains.
arXiv Detail & Related papers (2025-10-08T16:12:11Z) - Beyond Optimization: Exploring Novelty Discovery in Autonomous Experiments [0.8086551202409836]
We introduce a novel framework, INS2ANE, to enhance the discovery of novel phenomena in autonomous experimentation.<n>Our method integrates two key components: (1) a novelty scoring system that evaluates the uniqueness of experimental results, and (2) a strategic sampling mechanism that promotes exploration of under-sampled regions.
arXiv Detail & Related papers (2025-08-27T20:19:04Z) - Bayes-Entropy Collaborative Driven Agents for Research Hypotheses Generation and Optimization [4.469102316542763]
This paper proposes a multi-agent collaborative framework called HypoAgents.<n>It generates hypotheses through diversity sampling and establishes prior beliefs.<n>It then employs etrieval-augmented generation (RAG) to gather external literature evidence.<n>It identifies high-uncertainty hypotheses using information entropy $H = - sum p_ilog p_i$ and actively refines them.
arXiv Detail & Related papers (2025-08-03T13:05:32Z) - MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback [136.27567671480156]
We introduce experiment-guided ranking, which prioritizes hypotheses based on feedback from prior tests.<n>We frame experiment-guided ranking as a sequential decision-making problem.<n>Our approach significantly outperforms pre-experiment baselines and strong ablations.
arXiv Detail & Related papers (2025-05-23T13:24:50Z) - InternAgent: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification [24.752098402554743]
InternAgent is a unified closed-loop multi-agent framework to conduct Autonomous Scientific Research.<n>It has demonstrated its versatility across 12 scientific research tasks.<n>It has achieved promising performance gains in several scientific fields with significantly less time cost compared to human efforts.
arXiv Detail & Related papers (2025-05-22T17:27:43Z) - PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration [9.216546947535244]
We introduce PiFlow, an information-theoretical framework for automated scientific discovery.<n>Our method significantly improves discovery efficiency, reflected by a 73.55% increase in the Area Under the Curve.<n>Overall, PiFlow serves as a Plug-and-Play method, establishing a novel paradigm shift in highly efficient automated scientific discovery.
arXiv Detail & Related papers (2025-05-21T03:09:39Z) - Heterogeneity-Aware Client Sampling: A Unified Solution for Consistent Federated Learning [31.50593149242509]
Federated learning (FL) commonly involves clients with diverse communication and computational capabilities.<n>We reveal the fundamentally distinct mechanisms through which heterogeneous communication and computation drive inconsistency in FL.<n>We propose Federated Heterogeneity-Aware Client Sampling, FedACS, a universal method to eliminate all types of objective inconsistency.
arXiv Detail & Related papers (2025-05-16T14:31:36Z) - Prediction-Powered Causal Inferences [59.98498488132307]
We focus on Prediction-Powered Causal Inferences (PPCI)<n>We first show that conditional calibration guarantees valid PPCI at population level.<n>We then introduce a sufficient representation constraint transferring validity across experiments.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations.
We introduce Scientific Generative Agent (SGA), a bilevel optimization framework.
We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z) - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [50.40483334131271]
This work proposes the first dataset for social science academic hypotheses discovery.
Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity.
A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
arXiv Detail & Related papers (2023-09-06T05:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.