Retrosynthesis Planning via Worst-path Policy Optimisation in Tree-structured MDPs
- URL: http://arxiv.org/abs/2509.10504v1
- Date: Mon, 01 Sep 2025 21:44:14 GMT
- Title: Retrosynthesis Planning via Worst-path Policy Optimisation in Tree-structured MDPs
- Authors: Mianchu Wang, Giovanni Montana,
- Abstract summary: Retrosynthesis planning aims to decompose target molecules into available building blocks, forming a tree where each internal node represents an intermediate compound.<n>Existing methods often optimise for average performance across branches, failing to account for this worst-case sensitivity.<n>We introduce Interactive Retrosynthesis Planning (InterRetro), a method that interacts with the tree MDP, learns a value function for worst-path outcomes.<n>InterRetro achieves state-of-the-art results, solving 100% of targets on the Retro*-190 benchmark, shortening synthetic routes by 4.9%, and achieving promising performance using only 10% of the training data.
- Score: 8.914988066868927
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Retrosynthesis planning aims to decompose target molecules into available building blocks, forming a synthesis tree where each internal node represents an intermediate compound and each leaf ideally corresponds to a purchasable reactant. However, this tree becomes invalid if any leaf node is not a valid building block, making the planning process vulnerable to the "weakest link" in the synthetic route. Existing methods often optimise for average performance across branches, failing to account for this worst-case sensitivity. In this paper, we reframe retrosynthesis as a worst-path optimisation problem within tree-structured Markov Decision Processes (MDPs). We prove that this formulation admits a unique optimal solution and offers monotonic improvement guarantees. Building on this insight, we introduce Interactive Retrosynthesis Planning (InterRetro), a method that interacts with the tree MDP, learns a value function for worst-path outcomes, and improves its policy through self-imitation, preferentially reinforcing past decisions with high estimated advantage. Empirically, InterRetro achieves state-of-the-art results, solving 100% of targets on the Retro*-190 benchmark, shortening synthetic routes by 4.9%, and achieving promising performance using only 10% of the training data - representing a significant advance in computational retrosynthesis planning.
Related papers
- RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees [2.612627266839037]
Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees.<n>Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees.<n>We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT)<n> RS-ORT is a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables.
arXiv Detail & Related papers (2025-10-27T22:17:09Z) - Multi-Armed Bandits-Based Optimization of Decision Trees [0.0]
We propose a Multi-Armed Bandits (MAB)-based pruning approach, a reinforcement learning (RL)-based technique, that will dynamically prune the tree to generate an optimal decision tree with better generalization.<n>Our proposed approach assumes the pruning process as an exploration-exploitation problem, where we are utilizing the MAB algorithms to find optimal branch nodes to prune based on feedback from each pruning actions.
arXiv Detail & Related papers (2025-08-08T02:43:45Z) - Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards [98.0983122471875]
We propose Optimas, a unified framework for effective optimization of compound systems.<n>In each iteration, Optimas efficiently adapts the Local Reward Function (LRF) to maintain this property while simultaneously maximizing each component's local reward.<n>We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%.
arXiv Detail & Related papers (2025-07-03T07:12:48Z) - Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods [51.54704494242525]
We propose a new unifying framework, Birch SGD, for analyzing and designing distributed SGD methods.<n>Using Birch SGD, we design eight new methods and analyze them alongside previously known ones, with at least six of the new methods shown to have optimal computational time complexity.<n>Our research leads to two key insights: (i) all methods share the same "iteration rate" of $Oleft(frac(R + 1) L Deltavarepsilon + fracsigma2 L Deltavarepsilon2right)$, where $R$
arXiv Detail & Related papers (2025-05-14T08:37:45Z) - Soft regression trees: a model variant and a decomposition training algorithm [0.24578723416255752]
We propose a new variant of soft multivariate regression trees (SRTs) where, for every input vector, the prediction is defined as a linear regression associated to a single leaf node.<n>SRTs exhibit the conditional computational property, i.e., each prediction depends on a small number of nodes.<n> Experiments on 15 wellknown datasets indicate that our SRTs and decomposition algorithm yield higher accuracy and robustness compared with traditional soft regression trees.
arXiv Detail & Related papers (2025-01-10T13:06:36Z) - Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring [16.38771834692938]
We propose a novel framework capable of generating more faithful rationales and, more importantly, matching performance with black-box scoring systems.
We first mimic the human assessment process by querying Large Language Models (LLMs) to generate a thought tree.
We then summarise intermediate assessment decisions from each thought tree path for creating synthetic rationale data and rationale preference data.
arXiv Detail & Related papers (2024-06-28T14:33:05Z) - Retrosynthetic Planning with Dual Value Networks [107.97218669277913]
We propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN)
PDVN alternates between the planning phase and updating phase to predict the synthesizability and cost of molecules.
On the widely-used USPTO dataset, our PDVN algorithm improves the search success rate of existing multi-step planners.
arXiv Detail & Related papers (2023-01-31T16:43:53Z) - FusionRetro: Molecule Representation Fusion via In-Context Learning for
Retrosynthetic Planning [58.47265392465442]
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule.
Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms.
We propose a novel framework that utilizes context information for improved retrosynthetic planning.
arXiv Detail & Related papers (2022-09-30T08:44:58Z) - bsnsing: A decision tree induction method based on recursive optimal
boolean rule composition [2.28438857884398]
This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process.
It develops an efficient search solver that is able to solve practical instances faster than commercial solvers.
arXiv Detail & Related papers (2022-05-30T17:13:57Z) - Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and
Stability [67.8426046908398]
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world.
This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
arXiv Detail & Related papers (2022-04-08T20:46:16Z) - RetCL: A Selection-based Approach for Retrosynthesis via Contrastive
Learning [107.64562550844146]
Retrosynthesis is an emerging research area of deep learning.
We propose a new approach that reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules.
For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining.
arXiv Detail & Related papers (2021-05-03T12:47:57Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.