Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval
- URL: http://arxiv.org/abs/2411.11875v1
- Date: Mon, 04 Nov 2024 06:30:52 GMT
- Title: Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval
- Authors: Zijun Min, Bingshuai Liu, Liang Zhang, Jia Song, Jinsong Su, Song He, Xiaochen Bo,
- Abstract summary: We introduce the Optimal TRansport-based Multi-grained Alignments model (ORMA)
ORMA is a novel approach that facilitates multi-grained alignments between textual descriptions and molecules.
Experimental results on the ChEBI-20 and PCdes datasets demonstrate that ORMA significantly outperforms existing state-of-the-art (SOTA) models.
- Score: 24.061535843472427
- License:
- Abstract: The field of bioinformatics has seen significant progress, making the cross-modal text-molecule retrieval task increasingly vital. This task focuses on accurately retrieving molecule structures based on textual descriptions, by effectively aligning textual descriptions and molecules to assist researchers in identifying suitable molecular candidates. However, many existing approaches overlook the details inherent in molecule sub-structures. In this work, we introduce the Optimal TRansport-based Multi-grained Alignments model (ORMA), a novel approach that facilitates multi-grained alignments between textual descriptions and molecules. Our model features a text encoder and a molecule encoder. The text encoder processes textual descriptions to generate both token-level and sentence-level representations, while molecules are modeled as hierarchical heterogeneous graphs, encompassing atom, motif, and molecule nodes to extract representations at these three levels. A key innovation in ORMA is the application of Optimal Transport (OT) to align tokens with motifs, creating multi-token representations that integrate multiple token alignments with their corresponding motifs. Additionally, we employ contrastive learning to refine cross-modal alignments at three distinct scales: token-atom, multitoken-motif, and sentence-molecule, ensuring that the similarities between correctly matched text-molecule pairs are maximized while those of unmatched pairs are minimized. To our knowledge, this is the first attempt to explore alignments at both the motif and multi-token levels. Experimental results on the ChEBI-20 and PCdes datasets demonstrate that ORMA significantly outperforms existing state-of-the-art (SOTA) models.
Related papers
- FineMolTex: Towards Fine-grained Molecular Graph-Text Pre-training [36.130217091969335]
FineMolTex is a novel Fine-grained Molecular graph-Text pre-training framework to jointly learn coarse-grained molecule-level knowledge and fine-grained motif-level knowledge.
We conduct experiments across three downstream tasks, achieving up to 230% improvement in the text-based molecule editing task.
arXiv Detail & Related papers (2024-09-21T11:19:15Z) - UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation [35.51027934845928]
We introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture.
A Vector Quantization-driven tokenizer transforms molecules into sequences of molecule tokens with causal dependency.
UniMoT emerges as a multi-modal generalist capable of performing both molecule-to-text and text-to-molecule tasks.
arXiv Detail & Related papers (2024-08-01T18:31:31Z) - Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction [2.344198904343022]
HiPM stands for hierarchical prompted molecular representation learning framework.
Our framework comprises two core components: the Molecular Representation (MRE) and the Task-Aware Prompter (TAP)
arXiv Detail & Related papers (2024-05-29T03:10:21Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation [42.08917809689811]
We propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text.
In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average.
In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task.
arXiv Detail & Related papers (2024-04-23T12:35:44Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and
Uni-Modal Adapter [91.77292826067465]
Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks.
However, they inherently lack 2D graph perception.
We propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter.
arXiv Detail & Related papers (2023-10-19T14:52:58Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Translation between Molecules and Natural Language [43.518805086280466]
We present a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings.
$textbfMolT5$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation.
arXiv Detail & Related papers (2022-04-25T17:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.