TextOmics-Guided Diffusion for Hit-like Molecular Generation
- URL: http://arxiv.org/abs/2507.09982v1
- Date: Mon, 14 Jul 2025 06:56:37 GMT
- Title: TextOmics-Guided Diffusion for Hit-like Molecular Generation
- Authors: Hang Yuan, Chen Li, Wenjun Ma, Yuncheng Jiang,
- Abstract summary: Hit-like molecular generation with therapeutic potential is essential for target-specific drug discovery.<n>We introduce TextOmics, a pioneering benchmark that establishes one-to-one correspondences between omics expressions and molecular textual descriptions.<n>We propose ToDi, a generative framework that jointly conditions on omics expressions and molecular textual descriptions to produce biologically relevant, chemically valid, hit-like molecules.
- Score: 11.308365776402226
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Hit-like molecular generation with therapeutic potential is essential for target-specific drug discovery. However, the field lacks heterogeneous data and unified frameworks for integrating diverse molecular representations. To bridge this gap, we introduce TextOmics, a pioneering benchmark that establishes one-to-one correspondences between omics expressions and molecular textual descriptions. TextOmics provides a heterogeneous dataset that facilitates molecular generation through representations alignment. Built upon this foundation, we propose ToDi, a generative framework that jointly conditions on omics expressions and molecular textual descriptions to produce biologically relevant, chemically valid, hit-like molecules. ToDi leverages two encoders (OmicsEn and TextEn) to capture multi-level biological and semantic associations, and develops conditional diffusion (DiffGen) for controllable generation. Extensive experiments confirm the effectiveness of TextOmics and demonstrate ToDi outperforms existing state-of-the-art approaches, while also showcasing remarkable potential in zero-shot therapeutic molecular generation. Sources are available at: https://github.com/hala-ToDi.
Related papers
- Mol-CADiff: Causality-Aware Autoregressive Diffusion for Molecule Generation [13.401822039640297]
Mol-CADiff is a novel diffusion-based framework that uses causal attention mechanisms for text-conditional molecular generation.<n>Our approach explicitly models the causal relationship between textual prompts and molecular structures, overcoming key limitations in existing methods.<n>Our experiments demonstrate that Mol-CADiff outperforms state-of-the-art methods in generating diverse, novel, and chemically valid molecules.
arXiv Detail & Related papers (2025-03-07T15:10:37Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)<n>KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.<n>This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - Text-guided Diffusion Model for 3D Molecule Generation [26.09786612721824]
We introduce TextSMOG, a new Text-guided Small Molecule Generation Approach via 3D Diffusion Model.
This method uses textual conditions to guide molecule generation, enhancing both stability and diversity.
Experimental results show TextSMOG's proficiency in capturing and utilizing information from textual descriptions.
arXiv Detail & Related papers (2024-10-04T10:23:20Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)<n>FARM is a novel model designed to bridge the gap between SMILES, natural language, and molecular graphs.<n>We evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 11 out of 13 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.<n> Experiments show that LDMol outperforms the existing autoregressive baselines on the text-to-molecule generation benchmark.<n>We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Translation between Molecules and Natural Language [43.518805086280466]
We present a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings.
$textbfMolT5$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation.
arXiv Detail & Related papers (2022-04-25T17:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.