MolFM-Lite: Multi-Modal Molecular Property Prediction with Conformer Ensemble Attention and Cross-Modal Fusion
- URL: http://arxiv.org/abs/2602.22405v1
- Date: Wed, 25 Feb 2026 20:59:14 GMT
- Title: MolFM-Lite: Multi-Modal Molecular Property Prediction with Conformer Ensemble Attention and Cross-Modal Fusion
- Authors: Syed Omer Shah, Mohammed Maqsood Ahmed, Danish Mohiuddin Mohammed, Shahnawaz Alam, Mohd Vahaj ur Rahman,
- Abstract summary: MolFM-Lite is a multi-modal model that encodes SELFIESAUC sequences (1D), molecular graphs (2D), and conformer ensembles (3D)<n>Our main methodological contributions are: (1) a conformer ensemble attention mechanism that combines learnable attention with Boltzmann priors over multiple RDKit-generated conformers; and (2) a cross-modal fusion layer where each modality can attend to others.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most machine learning models for molecular property prediction rely on a single molecular representation (either a sequence, a graph, or a 3D structure) and treat molecular geometry as static. We present MolFM-Lite, a multi-modal model that jointly encodes SELFIES sequences (1D), molecular graphs (2D), and conformer ensembles (3D) through cross-attention fusion, while conditioning predictions on experimental context via Feature-wise Linear Modulation (FiLM). Our main methodological contributions are: (1) a conformer ensemble attention mechanism that combines learnable attention with Boltzmann-weighted priors over multiple RDKit-generated conformers, capturing the thermodynamic distribution of molecular shapes; and (2) a cross-modal fusion layer where each modality can attend to others, enabling complementary information sharing. We evaluate on four MoleculeNet scaffold-split benchmarks using our model's own splits, and report all baselines re-evaluated under the same protocol. Comprehensive ablation studies across all four datasets confirm that each architectural component contributes independently, with tri-modal fusion providing 7-11% AUC improvement over single-modality baselines and conformer ensembles adding approximately 2% over single-conformer variants. Pre-training on ZINC250K (~250K molecules) using cross-modal contrastive and masked-atom objectives enables effective weight initialization at modest compute cost. We release all code, trained models, and data splits to support reproducibility.
Related papers
- Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials [51.342983349686556]
We introduce Zatom-1, the first foundation model that unifies generative and predictive learning of 3D molecules and materials.<n>Zatom-1 is a Transformer trained with a multimodal flow matching objective that jointly models discrete atom types and continuous 3D geometries.
arXiv Detail & Related papers (2026-02-24T20:52:39Z) - Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning [5.909755629383169]
Multimodal molecular models often suffer from 3D conformer unreliability and modality collapse, limiting their robustness and generalization.<n>We propose MuMo, a structured multimodal fusion framework that addresses these challenges through two key strategies.<n>Built on a state space backbone, MuMo supports long-range dependency modeling and robust information propagation.
arXiv Detail & Related papers (2025-10-24T17:27:10Z) - Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration [15.929511077091687]
We propose FlexMol, a flexible molecule pre-training framework that learns unified molecular representations while supporting single-modality input.<n>Our approach employs separate models for 2D and 3D molecular data, leverages parameter sharing to improve computational efficiency, and utilizes a decoder to generate features for the missing modality.
arXiv Detail & Related papers (2025-10-08T14:02:51Z) - Composable Score-based Graph Diffusion Model for Multi-Conditional Molecular Generation [85.58520120011269]
We propose Composable Score-based Graph Diffusion model (CSGD), which extends score matching to discrete graphs via concrete scores.<n>We show that CSGD achieves state-of-the-art performance with a 15.3% average improvement in controllability over prior methods.<n>Our findings highlight the practical advantages of score-based modeling for discrete graph generation and its capacity for flexible, multi-property molecular design.
arXiv Detail & Related papers (2025-09-11T13:37:56Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning [17.93173928602627]
We propose a simple transformer-based baseline for multimodal molecular representation learning.
We integrate three distinct modalities: SMILES strings, 2D graph representations, and 3D conformers of molecules.
Despite its simplicity, our approach achieves state-of-the-art results across multiple datasets.
arXiv Detail & Related papers (2024-10-10T14:36:58Z) - UniIF: Unified Molecule Inverse Folding [67.60267592514381]
We propose a unified model UniIF for inverse folding of all molecules.
Our proposed method surpasses state-of-the-art methods on all tasks.
arXiv Detail & Related papers (2024-05-29T10:26:16Z) - Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction [9.388979080270103]
We construct multimodal deep learning models to cover different molecular representations.
Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise.
arXiv Detail & Related papers (2023-12-29T07:19:42Z) - Unified Molecular Modeling via Modality Blending [35.16755562674055]
We introduce a novel "blend-then-predict" self-supervised learning method (MoleBLEND)
MoleBLEND blends atom relations from different modalities into one unified relation for matrix encoding, then recovers modality-specific information for both 2D and 3D structures.
Experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D benchmarks.
arXiv Detail & Related papers (2023-07-12T15:27:06Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.