Trustworthy Retrosynthesis: Eliminating Hallucinations with a Diverse Ensemble of Reaction Scorers
- URL: http://arxiv.org/abs/2510.10645v2
- Date: Wed, 15 Oct 2025 15:58:16 GMT
- Title: Trustworthy Retrosynthesis: Eliminating Hallucinations with a Diverse Ensemble of Reaction Scorers
- Authors: Michal Sadowski, Tadija Radusinović, Maria Wyrzykowska, Lukasz Sztukiewicz, Jan Rzymkowski, Paweł Włodarczyk-Pruszyński, Mikołaj Sacha, Piotr Kozakowski, Ruard van Workum, Stanislaw Kamil Jastrzebski,
- Abstract summary: We present RetroTrim, a retrosynthesis system that successfully avoids nonsensical plans on a set of challenging drug-like targets.<n>Our system is not only the sole method that succeeds in filtering out hallucinated reactions, but it also results in the highest number of high-quality paths overall.
- Score: 1.3831711904009911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrosynthesis is one of the domains transformed by the rise of generative models, and it is one where the problem of nonsensical or erroneous outputs (hallucinations) is particularly insidious: reliable assessment of synthetic plans is time-consuming, with automatic methods lacking. In this work, we present RetroTrim, a retrosynthesis system that successfully avoids nonsensical plans on a set of challenging drug-like targets. Compared to common baselines in the field, our system is not only the sole method that succeeds in filtering out hallucinated reactions, but it also results in the highest number of high-quality paths overall. The key insight behind RetroTrim is the combination of diverse reaction scoring strategies, based on machine learning models and existing chemical databases. We show that our scoring strategies capture different classes of hallucinations by analyzing them on a dataset of labeled retrosynthetic intermediates. This approach formed the basis of our winning solution to the Standard Industries \$1 million Retrosynthesis Challenge. To measure the performance of retrosynthesis systems, we propose a novel evaluation protocol for reactions and synthetic paths based on a structured review by expert chemists. Using this protocol, we compare systems on a set of 32 novel targets, curated to reflect recent trends in drug structures. While the insights behind our methodology are broadly applicable to retrosynthesis, our focus is on targets in the drug-like domain. By releasing our benchmark targets and the details of our evaluation protocol, we hope to inspire further research into reliable retrosynthesis.
Related papers
- Retro-Rank-In: A Ranking-Based Approach for Inorganic Materials Synthesis Planning [1.3676986541298586]
Retrosynthesis strategically plans the synthesis of a chemical target compound from simpler, readily available precursor compounds.<n>We propose Retro-Rank-In, a novel framework that reformulates the retrosynthesis problem by embedding target and precursor materials into a shared latent space.<n>We show that Retro-Rank-In sets a new state-of-the-art, particularly in out-of-distribution generalization and candidate set ranking.
arXiv Detail & Related papers (2025-02-06T18:34:37Z) - Chemist-aligned retrosynthesis by ensembling diverse inductive bias models [5.47805641978534]
RetroChimera is a frontier retrosynthesis model built upon two newly developed components with complementary inductive biases.<n>We show it outperforms all major models by a large margin, demonstrating robustness outside the training data.<n>We also demonstrate zero-shot transfer to an internal dataset from a major pharmaceutical company.
arXiv Detail & Related papers (2024-12-06T18:55:19Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - Retro-prob: Retrosynthetic Planning Based on a Probabilistic Model [5.044138778500218]
Retrosynthesis is a fundamental but challenging task in organic chemistry.
Given a target molecule, the goal of retrosynthesis is to find out a series of reactions which could be assembled into a synthetic route.
We propose a new retrosynthetic planning algorithm called retro-prob to maximize the successful synthesis probability of target molecules.
arXiv Detail & Related papers (2024-05-25T08:23:40Z) - Re-evaluating Retrosynthesis Algorithms with Syntheseus [13.384695742156152]
We present a synthesis planning library with an extensive benchmarking framework, called syntheseus.
We demonstrate the capabilities of syntheseus by re-evaluating several previous retrosynthesis algorithms.
We end with guidance for future works in this area, and call the community to engage in the discussion on how to improve benchmarks for synthesis planning.
arXiv Detail & Related papers (2023-10-30T17:59:04Z) - Models Matter: The Impact of Single-Step Retrosynthesis on Synthesis
Planning [0.8620335948752805]
Retrosynthesis consists of breaking down a chemical compound step-by-step into molecular precursors.
Its two primary research directions, single-step retrosynthesis prediction and multi-step synthesis planning, are inherently intertwined.
We show that the choice of the single-step model can improve the overall success rate of synthesis planning by up to +28%.
arXiv Detail & Related papers (2023-08-10T12:04:47Z) - Recent advances in artificial intelligence for retrosynthesis [29.32667622776065]
Retrosynthesis is the cornerstone of organic chemistry, providing chemists in material and drug manufacturing access to poorly available and brand-new molecules.
Recent breakthroughs driven by artificial intelligence have revolutionized retrosynthesis.
arXiv Detail & Related papers (2023-01-14T09:29:39Z) - FusionRetro: Molecule Representation Fusion via In-Context Learning for
Retrosynthetic Planning [58.47265392465442]
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule.
Current strategies use a decoupled approach of single-step retrosynthesis models and search algorithms.
We propose a novel framework that utilizes context information for improved retrosynthetic planning.
arXiv Detail & Related papers (2022-09-30T08:44:58Z) - RetCL: A Selection-based Approach for Retrosynthesis via Contrastive
Learning [107.64562550844146]
Retrosynthesis is an emerging research area of deep learning.
We propose a new approach that reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules.
For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining.
arXiv Detail & Related papers (2021-05-03T12:47:57Z) - RetroXpert: Decompose Retrosynthesis Prediction like a Chemist [60.463900712314754]
We devise a novel template-free algorithm for automatic retrosynthetic expansion.
Our method disassembles retrosynthesis into two steps.
While outperforming the state-of-the-art baselines, our model also provides chemically reasonable interpretation.
arXiv Detail & Related papers (2020-11-04T04:35:34Z) - Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search [83.22850633478302]
Retrosynthetic planning identifies a series of reactions that can lead to the synthesis of a target product.
Existing methods either require expensive return estimation by rollout with high variance, or optimize for search speed rather than the quality.
We propose Retro*, a neural-based A*-like algorithm that finds high-quality synthetic routes efficiently.
arXiv Detail & Related papers (2020-06-29T05:53:33Z) - Retrosynthesis Prediction with Conditional Graph Logic Network [118.70437805407728]
Computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities.
We propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks.
arXiv Detail & Related papers (2020-01-06T05:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.