$\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models
- URL: http://arxiv.org/abs/2508.08657v1
- Date: Tue, 12 Aug 2025 05:46:47 GMT
- Title: $\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models
- Authors: Jiaxin Ju, Yizhen Zheng, Huan Yee Koh, Can Wang, Shirui Pan,
- Abstract summary: We propose a multi-view framework that integrates three perspectives: the molecular structure view, the molecular task view, and the molecular rules view.<n>Experiments demonstrate that $textM2$LLM achieves state-of-the-art performance on multiple benchmarks across classification and regression tasks.
- Score: 59.125833618091846
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate molecular property prediction is a critical challenge with wide-ranging applications in chemistry, materials science, and drug discovery. Molecular representation methods, including fingerprints and graph neural networks (GNNs), achieve state-of-the-art results by effectively deriving features from molecular structures. However, these methods often overlook decades of accumulated semantic and contextual knowledge. Recent advancements in large language models (LLMs) demonstrate remarkable reasoning abilities and prior knowledge across scientific domains, leading us to hypothesize that LLMs can generate rich molecular representations when guided to reason in multiple perspectives. To address these gaps, we propose $\text{M}^{2}$LLM, a multi-view framework that integrates three perspectives: the molecular structure view, the molecular task view, and the molecular rules view. These views are fused dynamically to adapt to task requirements, and experiments demonstrate that $\text{M}^{2}$LLM achieves state-of-the-art performance on multiple benchmarks across classification and regression tasks. Moreover, we demonstrate that representation derived from LLM achieves exceptional performance by leveraging two core functionalities: the generation of molecular embeddings through their encoding capabilities and the curation of molecular features through advanced reasoning processes.
Related papers
- Improving Large Molecular Language Model via Relation-aware Multimodal Collaboration [34.099746438477816]
We propose CoLLaMo, a large language model-based molecular assistant equipped with a multi-level molecular modality-collaborative projector.<n>Our experiments demonstrate that our CoLLaMo enhances the molecular modality generalization capabilities of LMLMs.
arXiv Detail & Related papers (2026-01-18T04:38:19Z) - KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge [73.51130155601824]
We introduce KnowMol-100K, a large-scale dataset with 100K fine-grained molecular annotations across multiple levels.<n>We also propose chemically-informative molecular representation, effectively addressing limitations in existing molecular representation strategies.<n>KnowMol achieves superior performance across molecular understanding and generation tasks.
arXiv Detail & Related papers (2025-10-22T11:23:58Z) - Reasoning-Enhanced Large Language Models for Molecular Property Prediction [19.593493317167646]
Molecular property prediction is crucial for drug discovery and materials science.<n>Existing approaches suffer from limited interpretability, poor cross-task generalization, and lack of chemical reasoning capabilities.<n>We propose MPPReasoner, a multimodal large language model that incorporates chemical reasoning for molecular property prediction.
arXiv Detail & Related papers (2025-10-11T15:05:45Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)<n>KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.<n>This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)<n>FARM is a novel model designed to bridge the gap between SMILES, natural language, and molecular graphs.<n>We evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 11 out of 13 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge [14.08112359246334]
We present MV-Mol, a representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs.
We show that MV-Mol provides improved representations that substantially benefit molecular property prediction.
arXiv Detail & Related papers (2024-06-14T08:48:10Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks [44.934084652800976]
We introduce the first MoleculAR Conformer Ensemble Learning benchmark to thoroughly evaluate the potential of learning on conformer ensembles.
Our findings reveal that direct learning from an conformer space can improve performance on a variety of tasks and models.
arXiv Detail & Related papers (2023-09-29T20:06:46Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.