MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
- URL: http://arxiv.org/abs/2602.17602v1
- Date: Thu, 19 Feb 2026 18:27:11 GMT
- Title: MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
- Authors: Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui Han, Dae-Woong Jeong,
- Abstract summary: We introduce MolHIT, a powerful molecular graph generation framework that overcomes long-standing performance limitations in existing methods.<n>Overall, MolHIT achieves new state-of-the-art performance on the MOSES dataset with near-perfect validity for the first time in graph diffusion.
- Score: 37.89307688620534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle to meet the desired properties compared to 1D modeling. In this work, we introduce MolHIT, a powerful molecular graph generation framework that overcomes long-standing performance limitations in existing methods. MolHIT is based on the Hierarchical Discrete Diffusion Model, which generalizes discrete diffusion to additional categories that encode chemical priors, and decoupled atom encoding that splits the atom types according to their chemical roles. Overall, MolHIT achieves new state-of-the-art performance on the MOSES dataset with near-perfect validity for the first time in graph diffusion, surpassing strong 1D baselines across multiple metrics. We further demonstrate strong performance in downstream tasks, including multi-property guided generation and scaffold extension.
Related papers
- Graph Diffusion that can Insert and Delete [18.092760343244898]
Generative models of graphs based on discrete Denoising Diffusion Probabilistic Models (DDPMs) offer a principled approach to molecular generation.<n>In this paper, we reformulate the noising and denoising processes to support monotonic insertion and deletion of nodes.<n>The resulting model, which we call GrIDDD, dynamically grows or shrinks the chemical graph during generation.
arXiv Detail & Related papers (2025-06-06T19:45:45Z) - LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.<n> Experiments show that LDMol outperforms the existing autoregressive baselines on the text-to-molecule generation benchmark.<n>We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - Improving Molecular Properties Prediction Through Latent Space Fusion [9.912768918657354]
We present a multi-view approach that combines latent spaces derived from state-of-the-art chemical models.
Our approach relies on two pivotal elements: the embeddings derived from MHG-GNN, which represent molecular structures as graphs, and MoLFormer embeddings rooted in chemical language.
We demonstrate the superior performance of our proposed multi-view approach compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T20:29:32Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.<n> Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.<n>By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation [4.517805235253331]
MolHF is a new hierarchical flow-based model that generates molecular graphs in a coarse-to-fine manner.
MolHF is the first flow-based model that can be applied to model larger molecules (polymer) with more than 100 heavy atoms.
arXiv Detail & Related papers (2023-05-15T08:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.