Related papers: LLaMo: Large Language Model-based Molecular Graph Assistant

LLaMo: Large Language Model-based Molecular Graph Assistant

URL: http://arxiv.org/abs/2411.00871v1
Date: Thu, 31 Oct 2024 03:56:05 GMT
Title: LLaMo: Large Language Model-based Molecular Graph Assistant
Authors: Jinyoung Park, Minseong Bae, Dohwan Ko, Hyunwoo J. Kim,
Abstract summary: We propose LLaMo: Large Language Model-based Molecular graph assistant. We present the multi-level graph projector that transforms graph representations into graph tokens. We also introduce machine-generated molecular graph instruction data to instruction-tune the large molecular graph-language model.
Score: 16.52956645156377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable generalization and instruction-following capabilities with instruction tuning. The advancements in LLMs and instruction tuning have led to the development of Large Vision-Language Models (LVLMs). However, the competency of the LLMs and instruction tuning have been less explored in the molecular domain. Thus, we propose LLaMo: Large Language Model-based Molecular graph assistant, which is an end-to-end trained large molecular graph-language model. To bridge the discrepancy between the language and graph modalities, we present the multi-level graph projector that transforms graph representations into graph tokens by abstracting the output representations of each GNN layer and motif representations with the cross-attention mechanism. We also introduce machine-generated molecular graph instruction data to instruction-tune the large molecular graph-language model for general-purpose molecule and language understanding. Our extensive experiments demonstrate that LLaMo shows the best performance on diverse tasks, such as molecular description generation, property prediction, and IUPAC name prediction. The code of LLaMo is available at https://github.com/mlvlab/LLaMo.

Related papers

G1: Teaching LLMs to Reason on Graphs with Reinforcement Learning [58.73279333365234]
Reinforcement Learning (RL) on synthetic graph-theoretic tasks can significantly scale graph reasoning abilities.<n>With RL on Erdos, G1 obtains substantial improvements in graph reasoning, where our finetuned 3B model even outperforms Qwen2.5-72B-Instruct (24x size)<n>Our findings offer an efficient, scalable path for building strong graph reasoners by finetuning LLMs with RL on graph-theoretic tasks.
arXiv Detail & Related papers (2025-05-24T04:33:41Z)
Bridging Molecular Graphs and Large Language Models [10.647911401603801]
Large Language Models (LLMs) have shown exceptional generalization capabilities, but their ability to process graph data, such as molecular structures, remains limited. This paper proposes Graph2Token, an efficient solution that aligns graph tokens to LLM tokens. Extensive experiments on molecular classification and regression tasks demonstrate the effectiveness of our proposed Graph2Token.
arXiv Detail & Related papers (2025-03-05T03:15:38Z)
LLaGA: Large Language and Graph Assistant [73.71990472543027]
Large Language and Graph Assistant (LLaGA) is an innovative model to handle the complexities of graph-structured data. LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks. Our experiments show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model.
arXiv Detail & Related papers (2024-02-13T02:03:26Z)
Large Language Models on Graphs: A Comprehensive Survey [77.16803297418201]
We provide a systematic review of scenarios and techniques related to large language models on graphs. We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs. We discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets.
arXiv Detail & Related papers (2023-12-05T14:14:27Z)
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter [91.77292826067465]
Language Models (LMs) have demonstrated impressive molecule understanding ability on various 1D text-related tasks. However, they inherently lack 2D graph perception. We propose MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter.
arXiv Detail & Related papers (2023-10-19T14:52:58Z)
Language is All a Graph Needs [33.9836278881785]
We propose InstructGLM (Instruction-finetuned Graph Language Model) with highly scalable prompts based on natural language instructions. Our method surpasses all GNN baselines on ogbn-arxiv, Cora and PubMed datasets.
arXiv Detail & Related papers (2023-08-14T13:41:09Z)
GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text [25.979382232281786]
We introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity.
arXiv Detail & Related papers (2023-08-14T03:12:29Z)
GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning [71.89623260998934]
This study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. Existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. We propose GIMLET, which unifies language models for both graph and text data.
arXiv Detail & Related papers (2023-05-28T18:27:59Z)
MolGraph: a Python package for the implementation of molecular graphs and graph neural networks with TensorFlow and Keras [51.92255321684027]
MolGraph is a graph neural network (GNN) package for molecular machine learning (ML) MolGraph implements a chemistry module to accommodate the generation of small molecular graphs, which can be passed to a GNN algorithm to solve a molecular ML problem. GNNs proved useful for molecular identification and improved interpretability of chromatographic retention time data.
arXiv Detail & Related papers (2022-08-21T18:37:41Z)
Keeping it Simple: Language Models can learn Complex Molecular Distributions [0.0]
We introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules. The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions.
arXiv Detail & Related papers (2021-12-06T13:40:58Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.