Related papers: Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism

Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism

URL: http://arxiv.org/abs/2510.15600v1
Date: Fri, 17 Oct 2025 12:47:50 GMT
Title: Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
Authors: Haoran Sun, Yankai Jiang, Zhenyu Tang, Yaning Pan, Shuang Gu, Zekai Lin, Lilong Wang, Wenjie Lou, Lei Liu, Lei Bai, Xiaosong Wang,
Abstract summary: We introduce SciRecipe, a large-scale dataset of over 12K structured protocols spanning 27 biological subfields.<n>To further improve protocol generation, we propose the "Sketch-and-Fill" paradigm, which separates analysis, structuring, and expression.<n>We develop Thoth, trained through a staged Knowledge-to-Action process that progresses from knowledge acquisition to operational reasoning.
Score: 26.631284893550056
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The foundation of reproducible science lies in protocols that are precise, logically ordered, and executable. The autonomous generation of these protocols through natural language queries could greatly improve the efficiency of the reproduction process. However, current leading large language models (LLMs) often generate incomplete or inconsistent protocols, limiting their utility. To address this limitation, we first introduce SciRecipe, a large-scale dataset of over 12K structured protocols spanning 27 biological subfields and encompassing both comprehension and problem-solving tasks. To further improve protocol generation, we propose the "Sketch-and-Fill" paradigm, which separates analysis, structuring, and expression to ensure each step is explicit and verifiable. Complementing this, the structured component-based reward mechanism evaluates step granularity, action order, and semantic fidelity, aligning model optimization with experimental reliability. Building on these components, we develop Thoth, trained through a staged Knowledge-to-Action process that progresses from knowledge acquisition to operational reasoning and ultimately to robust, executable protocol generation. Across multiple benchmarks, Thoth consistently surpasses both proprietary and open-source LLMs, achieving significant improvements in step alignment, logical sequencing, and semantic accuracy. Our approach paves the way for reliable scientific assistants that bridge knowledge with experimental execution. All data, code, and models will be released publicly.

Related papers

Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback [59.287761696290865]
We propose a computationally efficient algorithm that achieves a sublinear regret guarantee for contextual episodic Markov Decision Processes (MDPs) with personalized feedback.<n>We demonstrate the effectiveness of our method in learning personalized objectives from multi-turn interactions through experiments on both a synthetic episodic MDP and a real-world user booking dataset.
arXiv Detail & Related papers (2026-02-09T06:29:54Z)
Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science [70.3658845234978]
Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS)<n>Despite this rapid progress, the field still relies heavily on empirical trial-and-error.<n>This bottleneck stems from the ambiguity of attribution.<n>We propose a factor attribution paradigm to systematically identify collaboration-driving factors.
arXiv Detail & Related papers (2026-02-05T04:19:52Z)
EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning [63.03672166010434]
We introduce an evolutionary, task-agnostic, strategy-guided, executably-checkable data synthesis framework.<n>It jointly synthesizes problems, diverse candidate solutions, and verification artifacts.<n>It iteratively discovers strategies via a consistency-based evaluator that enforces agreement between human-annotated and strategy-induced checks.
arXiv Detail & Related papers (2025-10-20T11:56:35Z)
Optimizing LOCC Protocols on Product Stiefel Manifold [3.7715498552640434]
Local operations and classical communication (LOCC) is a foundational framework in quantum information from both theoretical and experimental perspectives.<n>We develop a framework to optimize fixed-round LOCC via Riemannian optimization on the product Stiefel manifold.<n>We demonstrate the applicability of this framework through key tasks in quantum information processing, such as entanglement distillation and state merging.
arXiv Detail & Related papers (2025-10-08T11:43:47Z)
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers [103.4410890572479]
We introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification.<n>LoongBench is a curated seed dataset containing 8,729 human-vetted examples across 12 domains.<n>LoongEnv is a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples.
arXiv Detail & Related papers (2025-09-03T06:42:40Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
ProtocolLLM: RTL Benchmark for SystemVerilog Generation of Communication Protocols [45.66401695351214]
We introduce ProtocolLLM, the first benchmark suite specifically targeting widely used SystemVerilog protocols.<n>We observe that most of the models fail to generate SystemVerilog code for communication protocols that follow timing constrains.
arXiv Detail & Related papers (2025-06-09T17:10:47Z)
BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning [31.739027752007928]
We present BioProBench, the first large-scale, multi-task benchmark for biological protocol understanding and reasoning.<n>Built upon 27K original protocols, it yields nearly 556K high-quality structured instances.
arXiv Detail & Related papers (2025-05-11T09:42:24Z)
Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs [8.340267449839681]
Self-driving laboratories have begun to replace human experimenters in performing single experimental skills or predetermined experimental protocols.<n>Efforts to automate protocol design have been initiated, but the capabilities of knowledge-based machine designers have not been fully elicited.<n>We propose a multi-faceted, multi-scale representation, where instance actions, generalized operations, and product flow models are hierarchically encapsulated.
arXiv Detail & Related papers (2025-04-04T12:05:15Z)
Understanding Token-level Topological Structures in Transformer-based Time Series Forecasting [52.364260925700485]
Transformer-based methods have achieved state-of-the-art performance in time series forecasting (TSF)<n>It remains unclear whether existing Transformers fully leverage the intrinsic topological structure among tokens throughout intermediate layers.<n>We propose the Topology Enhancement Method (TEM), a novel Transformer-based TSF method that explicitly and adaptively preserves token-level topology.
arXiv Detail & Related papers (2024-04-16T07:21:39Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology [41.952424120054914]
Large Language Models (LLMs) have impressive capabilities on a wide range of tasks. We present an automatic evaluation framework for the task of planning experimental protocols. We evaluate GPT-3 and GPT-4 on this task and explore their robustness.
arXiv Detail & Related papers (2023-10-16T17:54:20Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language [81.73820295186727]
We propose a formalism of procedures as programs, a powerful yet intuitive method of representing hierarchical procedural knowledge for agent command and control. We instantiate this framework on the IQA and ALFRED datasets for NL instruction following.
arXiv Detail & Related papers (2021-09-16T20:36:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.