MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model
- URL: http://arxiv.org/abs/2503.04780v1
- Date: Sun, 23 Feb 2025 14:38:29 GMT
- Title: MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model
- Authors: Sumin Ha, Jun Hyeong Kim, Yinhua Piao, Sun Kim,
- Abstract summary: We propose MV-CLAM, a novel framework that aligns multi-view molecular representations into a unified textual space.<n>Our approach ensures cross-view consistency while a token-level contrastive loss preserves diverse molecular features.<n>MV-CLAM enhances molecular reasoning, improving retrieval and captioning accuracy.
- Score: 5.246680705885042
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Human expertise in chemistry and biomedicine relies on contextual molecular understanding, a capability that large language models (LLMs) can extend through fine-grained alignment between molecular structures and text. Recent multimodal learning advances focus on cross-modal alignment, but existing molecule-text models ignore complementary information in different molecular views and rely on single-view representations, limiting molecular understanding. Moreover, na\"ive multi-view alignment strategies face two challenges: (1) separate aligned spaces with inconsistent mappings between molecule and text embeddings, and that (2) existing loss objectives fail to preserve complementary information for fine-grained alignment. This can limit the LLM's ability to fully understand the molecular properties. To address these issues, we propose MV-CLAM, a novel framework that aligns multi-view molecular representations into a unified textual space using a multi-query transformer (MQ-Former). Our approach ensures cross-view consistency while a token-level contrastive loss preserves diverse molecular features across textual queries. MV-CLAM enhances molecular reasoning, improving retrieval and captioning accuracy. The source code of MV-CLAM is available in https://github.com/sumin124/mv-clam.git.
Related papers
- Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model [55.87790704067848]
Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules.<n>We introduce a module that integrates complementary information from different molecular encoders.<n>Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules.
arXiv Detail & Related papers (2025-02-19T05:49:10Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval [24.061535843472427]
We introduce the Optimal TRansport-based Multi-grained Alignments model (ORMA)
ORMA is a novel approach that facilitates multi-grained alignments between textual descriptions and molecules.
Experimental results on the ChEBI-20 and PCdes datasets demonstrate that ORMA significantly outperforms existing state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-11-04T06:30:52Z) - UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation [35.51027934845928]
We introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture.
A Vector Quantization-driven tokenizer transforms molecules into sequences of molecule tokens with causal dependency.
UniMoT emerges as a multi-modal generalist capable of performing both molecule-to-text and text-to-molecule tasks.
arXiv Detail & Related papers (2024-08-01T18:31:31Z) - Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models [43.26037039251725]
We propose AMOLE, which augments molecule-text pairs with structural similarity preserving loss.
We also propose an expertise reconstruction loss to transfer knowledge from molecules that have extensive expertise to those with less expertise.
arXiv Detail & Related papers (2024-07-12T07:09:10Z) - Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge [14.08112359246334]
We present MV-Mol, a representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs.
We show that MV-Mol provides improved representations that substantially benefit molecular property prediction.
arXiv Detail & Related papers (2024-06-14T08:48:10Z) - MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension [34.586861881519134]
Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields.
This study seeks to enhance the ability of LLMs to comprehend molecules by equipping them with a multi-modal external module, namely MolX.
In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations.
arXiv Detail & Related papers (2024-06-10T20:25:18Z) - MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and
Uni-Modal Adapter [91.77292826067465]
Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks.
However, they inherently lack 2D graph perception.
We propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter.
arXiv Detail & Related papers (2023-10-19T14:52:58Z) - Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.