RTMol: Rethinking Molecule-text Alignment in a Round-trip View
- URL: http://arxiv.org/abs/2511.12135v2
- Date: Fri, 21 Nov 2025 09:48:05 GMT
- Title: RTMol: Rethinking Molecule-text Alignment in a Round-trip View
- Authors: Letian Chen, Runhan Shi, Gufeng Yu, Yang Yang,
- Abstract summary: We propose RTMol, a bidirectional alignment framework that unifies molecular captioning and text-to-SMILES generation through self-supervised round-trip learning.<n>Experiments demonstrate that RTMol enhances bidirectional alignment performance by up to 47% across various LLMs.
- Score: 4.597922051722059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aligning molecular sequence representations (e.g., SMILES notations) with textual descriptions is critical for applications spanning drug discovery, materials design, and automated chemical literature analysis. Existing methodologies typically treat molecular captioning (molecule-to-text) and text-based molecular design (text-to-molecule) as separate tasks, relying on supervised fine-tuning or contrastive learning pipelines. These approaches face three key limitations: (i) conventional metrics like BLEU prioritize linguistic fluency over chemical accuracy, (ii) training datasets frequently contain chemically ambiguous narratives with incomplete specifications, and (iii) independent optimization of generation directions leads to bidirectional inconsistency. To address these issues, we propose RTMol, a bidirectional alignment framework that unifies molecular captioning and text-to-SMILES generation through self-supervised round-trip learning. The framework introduces novel round-trip evaluation metrics and enables unsupervised training for molecular captioning without requiring paired molecule-text corpora. Experiments demonstrate that RTMol enhances bidirectional alignment performance by up to 47% across various LLMs, establishing an effective paradigm for joint molecule-text understanding and generation.
Related papers
- How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning? [51.286853421822705]
Large language models (LLMs) have shown promise for reasoning-intensive scientific tasks, but their capability for chemical interpretation is still unclear.<n>We introduce a Chain-of-Thought (CoT) prompting framework and benchmark that evaluate how LLMs reason about mass spectral data to predict molecular structures.<n>Our evaluation across metrics of SMILES validity, formula consistency, and structural similarity reveals that while LLMs can produce syntactically valid and partially plausible structures, they fail to achieve chemical accuracy or link reasoning to correct molecular predictions.
arXiv Detail & Related papers (2026-01-09T20:08:42Z) - $\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models [59.125833618091846]
We propose a multi-view framework that integrates three perspectives: the molecular structure view, the molecular task view, and the molecular rules view.<n>Experiments demonstrate that $textM2$LLM achieves state-of-the-art performance on multiple benchmarks across classification and regression tasks.
arXiv Detail & Related papers (2025-08-12T05:46:47Z) - MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model [5.246680705885042]
We propose MV-CLAM, a novel framework that aligns multi-view molecular representations into a unified textual space.<n>Our approach ensures cross-view consistency while a token-level contrastive loss preserves diverse molecular features.<n>MV-CLAM enhances molecular reasoning, improving retrieval and captioning accuracy.
arXiv Detail & Related papers (2025-02-23T14:38:29Z) - MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts [23.53304253421472]
MolReFlect is a teacher-student framework designed to contextually perform the molecule-caption alignments in a fine-grained way.
Our experimental results demonstrate that MolReFlect enables LLMs like Mistral-7B to significantly outperform the previous baselines.
arXiv Detail & Related papers (2024-11-22T04:28:56Z) - Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval [24.061535843472427]
We introduce the Optimal TRansport-based Multi-grained Alignments model (ORMA)
ORMA is a novel approach that facilitates multi-grained alignments between textual descriptions and molecules.
Experimental results on the ChEBI-20 and PCdes datasets demonstrate that ORMA significantly outperforms existing state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-11-04T06:30:52Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)<n>FARM is a novel model designed to bridge the gap between SMILES, natural language, and molecular graphs.<n>We evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 11 out of 13 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - Advancing Molecular Graph-Text Pre-training via Fine-grained Alignment [36.130217091969335]
FineMolTex is a novel Fine-grained Molecular graph-Text pre-training framework.<n>It learns coarse-grained molecule-level knowledge and fine-grained motif-level knowledge.<n>FineMolTex successfully captures fine-grained knowledge, potentially offering valuable insights for drug discovery and catalyst design.
arXiv Detail & Related papers (2024-09-21T11:19:15Z) - Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation [42.08917809689811]
Cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation.<n>We propose Atomas, a hierarchical molecular representation learning framework that jointly learns representations from SMILES strings and text.<n>Atomas achieves superior performance across 12 tasks on 11 datasets, outperforming 11 baseline models.
arXiv Detail & Related papers (2024-04-23T12:35:44Z) - GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule
Zero-Shot Learning [71.89623260998934]
This study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting.
Existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs.
We propose GIMLET, which unifies language models for both graph and text data.
arXiv Detail & Related papers (2023-05-28T18:27:59Z) - MolXPT: Wrapping Molecules with Text for Generative Pre-training [141.0924452870112]
MolXPT is a unified language model of text and molecules pre-trained on SMILES wrapped by text.
MolXPT outperforms strong baselines of molecular property prediction on MoleculeNet.
arXiv Detail & Related papers (2023-05-18T03:58:19Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - Translation between Molecules and Natural Language [43.518805086280466]
We present a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings.
$textbfMolT5$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation.
arXiv Detail & Related papers (2022-04-25T17:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.