MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and
Uni-Modal Adapter
- URL: http://arxiv.org/abs/2310.12798v4
- Date: Thu, 18 Jan 2024 09:03:24 GMT
- Title: MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and
Uni-Modal Adapter
- Authors: Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji
Kawaguchi, Xiang Wang, Tat-Seng Chua
- Abstract summary: Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks.
However, they inherently lack 2D graph perception.
We propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter.
- Score: 91.77292826067465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language Models (LMs) have demonstrated impressive molecule understanding
ability on various 1D text-related tasks. However, they inherently lack 2D
graph perception - a critical ability of human professionals in comprehending
molecules' topological structures. To bridge this gap, we propose MolCA:
Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal
Adapter. MolCA enables an LM (e.g., Galactica) to understand both text- and
graph-based molecular contents via the cross-modal projector. Specifically, the
cross-modal projector is implemented as a Q-Former to connect a graph encoder's
representation space and an LM's text space. Further, MolCA employs a uni-modal
adapter (i.e., LoRA) for the LM's efficient adaptation to downstream tasks.
Unlike previous studies that couple an LM with a graph encoder via cross-modal
contrastive learning, MolCA retains the LM's ability of open-ended text
generation and augments it with 2D graph information. To showcase its
effectiveness, we extensively benchmark MolCA on tasks of molecule captioning,
IUPAC name prediction, and molecule-text retrieval, on which MolCA
significantly outperforms the baselines. Our codes and checkpoints can be found
at https://github.com/acharkq/MolCA.
Related papers
- LLaMo: Large Language Model-based Molecular Graph Assistant [16.52956645156377]
We propose LLaMo: Large Language Model-based Molecular graph assistant.
We present the multi-level graph projector that transforms graph representations into graph tokens.
We also introduce machine-generated molecular graph instruction data to instruction-tune the large molecular graph-language model.
arXiv Detail & Related papers (2024-10-31T03:56:05Z) - Chemical Language Model Linker: blending text and molecules with modular adapters [2.2667044928324747]
We propose a lightweight adapter-based strategy named Chemical Language Model Linker (ChemLML)
ChemLML blends the two single domain models and obtains conditional molecular generation from text descriptions.
We find that the choice of molecular representation used within ChemLML, SMILES versus SELFIES, has a strong influence on conditional molecular generation performance.
arXiv Detail & Related papers (2024-10-26T13:40:13Z) - MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension [34.586861881519134]
Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields.
This study seeks to enhance the ability of LLMs to comprehend molecules by equipping them with a multi-modal external module, namely MolX.
In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations.
arXiv Detail & Related papers (2024-06-10T20:25:18Z) - GIT-Mol: A Multi-modal Large Language Model for Molecular Science with
Graph, Image, and Text [25.979382232281786]
We introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information.
We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity.
arXiv Detail & Related papers (2023-08-14T03:12:29Z) - Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z) - GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule
Zero-Shot Learning [71.89623260998934]
This study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting.
Existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs.
We propose GIMLET, which unifies language models for both graph and text data.
arXiv Detail & Related papers (2023-05-28T18:27:59Z) - One Transformer Can Understand Both 2D & 3D Molecular Data [94.93514673086631]
We develop a novel Transformer-based Molecular model called Transformer-M.
It can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations.
All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks.
arXiv Detail & Related papers (2022-10-04T17:30:31Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - MolScribe: Robust Molecular Structure Recognition with Image-To-Graph
Generation [28.93523736883784]
MolScribe is an image-to-graph model that explicitly predicts atoms and bonds, along with their geometric layouts, to construct the molecular structure.
MolScribe significantly outperforms previous models, achieving 76-93% accuracy on public benchmarks.
arXiv Detail & Related papers (2022-05-28T03:03:45Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.