Towards 3D Molecule-Text Interpretation in Language Models
- URL: http://arxiv.org/abs/2401.13923v2
- Date: Sun, 17 Mar 2024 08:51:45 GMT
- Title: Towards 3D Molecule-Text Interpretation in Language Models
- Authors: Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian,
- Abstract summary: 3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder.
This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space.
We meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT.
- Score: 125.56693661827181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder. This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space. Moreover, to enhance 3D-MoLM's ability of cross-modal molecular understanding and instruction following, we meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT. Through 3D molecule-text alignment and 3D molecule-centric instruction tuning, 3D-MoLM establishes an integration of 3D molecular encoder and LM. It significantly surpasses existing baselines on downstream tasks, including molecule-text retrieval, molecule captioning, and more challenging open-text molecular QA tasks, especially focusing on 3D-dependent properties. We release our codes and datasets at https://github.com/lsh0520/3D-MoLM.
Related papers
- NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation [72.22099363325145]
We propose a foundation model -- NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation.
NExT-Mol uses an extensively pretrained molecule LM for 1D molecule generation, and subsequently predicts the generated molecule's 3D conformers.
We enhance NExT-Mol's performance by scaling up the LM's model size, refining the diffusion neural architecture, and applying 1D to 3D transfer learning.
arXiv Detail & Related papers (2025-02-18T08:40:13Z) - 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding [49.15555885075644]
We develop pipeline based on open-source 2D MLLMs and LLMs to generate high-quality 3D-text pairs.
We introduce the 3UR-LLM model, an end-to-end 3D MLLM designed for precise interpretation of 3D scenes.
arXiv Detail & Related papers (2025-01-14T03:50:23Z) - 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization [41.07090635630771]
3D-MolT5 is a unified framework designed to model both 1D molecular sequence and 3D molecular structure.
Key innovation lies in our methodology for mapping fine-grained 3D substructure representations to a specialized 3D token vocabulary.
Our proposed 3D-MolT5 shows superior performance than existing methods in molecular property prediction, molecule captioning, and text-based molecule generation tasks.
arXiv Detail & Related papers (2024-06-09T14:20:55Z) - VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification [56.211321810408194]
Large language models (LLMs) have shown great potential in multi-modal understanding and generation tasks.
We present Volume Patch LLM (VP-LLM), which leverages LLMs to perform conditional 3D completion in a single-forward pass.
Our results demonstrate a strong ability of LLMs to interpret complex text instructions and understand 3D objects, surpassing state-of-the-art diffusion-based 3D completion models in generation quality.
arXiv Detail & Related papers (2024-06-08T18:17:09Z) - 3M-Diffusion: Latent Multi-Modal Diffusion for Language-Guided Molecular Structure Generation [18.55127917150268]
3M-Diffusion is a novel multi-modal molecular graph generation method.
It generates diverse, ideally novel molecular structures with desired properties.
arXiv Detail & Related papers (2024-03-11T21:44:54Z) - 3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information [1.1777304970289215]
3D-Mol is a novel approach designed for more accurate spatial structure representation.
It deconstructs molecules into three hierarchical graphs to better extract geometric information.
We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.
arXiv Detail & Related papers (2023-09-28T10:05:37Z) - 3D-LLM: Injecting the 3D World into Large Language Models [60.43823088804661]
Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning.
We propose to inject the 3D world into large language models and introduce a new family of 3D-LLMs.
Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks.
arXiv Detail & Related papers (2023-07-24T17:59:02Z) - One Transformer Can Understand Both 2D & 3D Molecular Data [94.93514673086631]
We develop a novel Transformer-based Molecular model called Transformer-M.
It can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations.
All empirical results show that Transformer-M can simultaneously achieve strong performance on 2D and 3D tasks.
arXiv Detail & Related papers (2022-10-04T17:30:31Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z) - 3D-Transformer: Molecular Representation with Transformer in 3D Space [11.947499562836953]
3D-Transformer is a variant of the Transformer for molecular representations that incorporates 3D spatial information.
Our experiments show significant improvements over state-of-the-art models on the crystal property prediction task and the protein-ligand binding affinity prediction task.
arXiv Detail & Related papers (2021-10-04T05:11:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.