Related papers: MolFM-Lite: Multi-Modal Molecular Property Prediction with Conformer Ensemble Attention and Cross-Modal Fusion

MolFM-Lite: Multi-Modal Molecular Property Prediction with Conformer Ensemble Attention and Cross-Modal Fusion

URL: http://arxiv.org/abs/2602.22405v1
Date: Wed, 25 Feb 2026 20:59:14 GMT
Title: MolFM-Lite: Multi-Modal Molecular Property Prediction with Conformer Ensemble Attention and Cross-Modal Fusion
Authors: Syed Omer Shah, Mohammed Maqsood Ahmed, Danish Mohiuddin Mohammed, Shahnawaz Alam, Mohd Vahaj ur Rahman,
Abstract summary: MolFM-Lite is a multi-modal model that encodes SELFIESAUC sequences (1D), molecular graphs (2D), and conformer ensembles (3D)<n>Our main methodological contributions are: (1) a conformer ensemble attention mechanism that combines learnable attention with Boltzmann priors over multiple RDKit-generated conformers; and (2) a cross-modal fusion layer where each modality can attend to others.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most machine learning models for molecular property prediction rely on a single molecular representation (either a sequence, a graph, or a 3D structure) and treat molecular geometry as static. We present MolFM-Lite, a multi-modal model that jointly encodes SELFIES sequences (1D), molecular graphs (2D), and conformer ensembles (3D) through cross-attention fusion, while conditioning predictions on experimental context via Feature-wise Linear Modulation (FiLM). Our main methodological contributions are: (1) a conformer ensemble attention mechanism that combines learnable attention with Boltzmann-weighted priors over multiple RDKit-generated conformers, capturing the thermodynamic distribution of molecular shapes; and (2) a cross-modal fusion layer where each modality can attend to others, enabling complementary information sharing. We evaluate on four MoleculeNet scaffold-split benchmarks using our model's own splits, and report all baselines re-evaluated under the same protocol. Comprehensive ablation studies across all four datasets confirm that each architectural component contributes independently, with tri-modal fusion providing 7-11% AUC improvement over single-modality baselines and conformer ensembles adding approximately 2% over single-conformer variants. Pre-training on ZINC250K (~250K molecules) using cross-modal contrastive and masked-atom objectives enables effective weight initialization at modest compute cost. We release all code, trained models, and data splits to support reproducibility.

Related papers

Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials [51.342983349686556]
We introduce Zatom-1, the first foundation model that unifies generative and predictive learning of 3D molecules and materials.<n>Zatom-1 is a Transformer trained with a multimodal flow matching objective that jointly models discrete atom types and continuous 3D geometries.
arXiv Detail & Related papers (2026-02-24T20:52:39Z)
Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning [5.909755629383169]
Multimodal molecular models often suffer from 3D conformer unreliability and modality collapse, limiting their robustness and generalization.<n>We propose MuMo, a structured multimodal fusion framework that addresses these challenges through two key strategies.<n>Built on a state space backbone, MuMo supports long-range dependency modeling and robust information propagation.
arXiv Detail & Related papers (2025-10-24T17:27:10Z)
Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration [15.929511077091687]
We propose FlexMol, a flexible molecule pre-training framework that learns unified molecular representations while supporting single-modality input.<n>Our approach employs separate models for 2D and 3D molecular data, leverages parameter sharing to improve computational efficiency, and utilizes a decoder to generate features for the missing modality.
arXiv Detail & Related papers (2025-10-08T14:02:51Z)
Composable Score-based Graph Diffusion Model for Multi-Conditional Molecular Generation [85.58520120011269]
We propose Composable Score-based Graph Diffusion model (CSGD), which extends score matching to discrete graphs via concrete scores.<n>We show that CSGD achieves state-of-the-art performance with a 15.3% average improvement in controllability over prior methods.<n>Our findings highlight the practical advantages of score-based modeling for discrete graph generation and its capacity for flexible, multi-property molecular design.
arXiv Detail & Related papers (2025-09-11T13:37:56Z)
DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures. DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals. Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z)
MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning [17.93173928602627]
We propose a simple transformer-based baseline for multimodal molecular representation learning. We integrate three distinct modalities: SMILES strings, 2D graph representations, and 3D conformers of molecules. Despite its simplicity, our approach achieves state-of-the-art results across multiple datasets.
arXiv Detail & Related papers (2024-10-10T14:36:58Z)
UniIF: Unified Molecule Inverse Folding [67.60267592514381]
We propose a unified model UniIF for inverse folding of all molecules. Our proposed method surpasses state-of-the-art methods on all tasks.
arXiv Detail & Related papers (2024-05-29T10:26:16Z)
Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction [9.388979080270103]
We construct multimodal deep learning models to cover different molecular representations. Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise.
arXiv Detail & Related papers (2023-12-29T07:19:42Z)
Unified Molecular Modeling via Modality Blending [35.16755562674055]
We introduce a novel "blend-then-predict" self-supervised learning method (MoleBLEND) MoleBLEND blends atom relations from different modalities into one unified relation for matrix encoding, then recovers modality-specific information for both 2D and 3D structures. Experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D benchmarks.
arXiv Detail & Related papers (2023-07-12T15:27:06Z)
Learning Neural Generative Dynamics for Molecular Conformation Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph. We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z)
MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.