Test-Time Training Scaling for Chemical Exploration in Drug Design
- URL: http://arxiv.org/abs/2501.19153v1
- Date: Fri, 31 Jan 2025 14:11:10 GMT
- Title: Test-Time Training Scaling for Chemical Exploration in Drug Design
- Authors: Morgan Thomas, Albert Bou, Gianni De Fabritiis,
- Abstract summary: We propose a new benchmark to discover dissimilar molecules that possess similar bioactivity.<n>We show that a population of RL agents can solve the benchmark, while a single agent cannot.<n>We also find that cooperative strategies are not significantly better than independent agents.
- Score: 3.1406146587437904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chemical language models for molecular design have the potential to find solutions to multi-parameter optimization problems in drug discovery via reinforcement learning (RL). A key requirement to achieve this is the capacity to "search" chemical space to identify all molecules of interest. Here, we propose a challenging new benchmark to discover dissimilar molecules that possess similar bioactivity, a common scenario in drug discovery, but a hard problem to optimize. We show that a population of RL agents can solve the benchmark, while a single agent cannot. We also find that cooperative strategies are not significantly better than independent agents. Moreover, the performance on the benchmark scales log-linearly with the number of independent agents, showing a test-time training scaling law for chemical language models.
Related papers
- InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization [77.79862482208326]
InversionGNN is an effective yet sample-efficient dual-path graph neural network (GNN) for multi-objective drug discovery.
We train the model for multi-property prediction to acquire knowledge of the optimal combination of functional groups.
Then the learned chemical knowledge helps the inversion generation path to generate molecules with required properties.
arXiv Detail & Related papers (2025-03-03T12:53:36Z) - Efficient Evolutionary Search Over Chemical Space with Large Language Models [31.31899988523534]
optimization objectives can be non-differentiable.
We introduce chemistry-aware Large Language Models (LLMs) into evolutionary algorithms.
Our algorithm improves both the quality of the final solution and convergence speed.
arXiv Detail & Related papers (2024-06-23T06:22:49Z) - A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics [0.0]
We present conditional Gaussian process models to predict ordinal outcomes from chemical experiments.
A novel aspect of our model is that the kernel contains a scaling parameter, that controls the strength of the correlation between elements of the chemical space.
We present a genetic algorithm for the facilitation of chemical discovery and identification of important features to the compound's efficacy.
arXiv Detail & Related papers (2024-05-16T11:18:32Z) - ACEGEN: Reinforcement learning of generative chemical agents for drug discovery [4.966722586536789]
ACEGEN is a comprehensive and streamlined toolkit for generative drug design.
TorchRL is a modern RL library that offers thoroughly tested reusable components.
We show examples of ACEGEN applied in multiple drug discovery case studies.
arXiv Detail & Related papers (2024-05-07T20:30:14Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Semi-Supervised GCN for learning Molecular Structure-Activity
Relationships [4.468952886990851]
We propose to train graph-to-graph neural network using semi-supervised learning for attributing structure-property relationships.
As final goal, our approach could represent a valuable tool to deal with problems such as activity cliffs, lead optimization and de-novo drug design.
arXiv Detail & Related papers (2022-01-25T09:09:43Z) - CELLS: Cost-Effective Evolution in Latent Space for Goal-Directed
Molecular Generation [23.618366377098614]
We propose a cost-effective evolution strategy in latent space, which optimize the molecular latent representation vectors.
We adopt a pre-trained molecular generative model to map the latent and observation spaces.
We conduct extensive experiments on multiple optimization tasks comparing the proposed framework to several advanced techniques.
arXiv Detail & Related papers (2021-11-30T11:02:18Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - The Synthesizability of Molecules Proposed by Generative Models [3.032184156362992]
Discovery of functional molecules is an expensive and time-consuming process.
One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization.
These techniques can suggest novel molecular structures intended to maximize a multi-objective function.
However, the utility of these approaches is stymied by ignorance of synthesizability.
arXiv Detail & Related papers (2020-02-17T15:41:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.