ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models
- URL: http://arxiv.org/abs/2506.00880v1
- Date: Sun, 01 Jun 2025 07:44:16 GMT
- Title: ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models
- Authors: Zhuo Chen, Yizhen Zheng, Huan Yee Koh, Hongxin Xiang, Linjiang Chen, Wenjie Du, Yang Wang,
- Abstract summary: Molecular Learning aims to understand interactions between molecular pairs, playing a critical role in advancing biochemical research.<n>With the recent development of large language models (LLMs), a growing number of studies have explored the integration of MRL with LLMs.<n>We propose ModuLM, a framework designed to support flexible LLM-based model construction and diverse molecular representations.
- Score: 15.585313548455517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecular Relational Learning (MRL) aims to understand interactions between molecular pairs, playing a critical role in advancing biochemical research. With the recent development of large language models (LLMs), a growing number of studies have explored the integration of MRL with LLMs and achieved promising results. However, the increasing availability of diverse LLMs and molecular structure encoders has significantly expanded the model space, presenting major challenges for benchmarking. Currently, there is no LLM framework that supports both flexible molecular input formats and dynamic architectural switching. To address these challenges, reduce redundant coding, and ensure fair model comparison, we propose ModuLM, a framework designed to support flexible LLM-based model construction and diverse molecular representations. ModuLM provides a rich suite of modular components, including 8 types of 2D molecular graph encoders, 11 types of 3D molecular conformation encoders, 7 types of interaction layers, and 7 mainstream LLM backbones. Owing to its highly flexible model assembly mechanism, ModuLM enables the dynamic construction of over 50,000 distinct model configurations. In addition, we provide comprehensive results to demonstrate the effectiveness of ModuLM in supporting LLM-based MRL tasks.
Related papers
- LLaVA-KD: A Framework of Distilling Multimodal Large Language Models [70.19607283302712]
We propose a novel framework to transfer knowledge from l-MLLM to s-MLLM.
Specifically, we introduce Multimodal Distillation (MDist) to minimize the divergence between the visual-textual output distributions of l-MLLM and s-MLLM.
We also propose a three-stage training scheme to fully exploit the potential of s-MLLM.
arXiv Detail & Related papers (2024-10-21T17:41:28Z) - FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational Learning [46.68373573330836]
We introduce FlexMol, a comprehensive toolkit designed to facilitate the construction and evaluation of diverse model architectures.
FlexMol offers a robust suite of preset model components, including 16 drug encoders, 13 protein sequence encoders, 9 protein structure encoders, and 7 interaction layers.
With its easy-to-use API and flexibility, FlexMol supports the dynamic construction of over 70, 000 distinct combinations of model architectures.
arXiv Detail & Related papers (2024-10-19T06:53:11Z) - Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning [32.745100532916204]
Large language models (LLMs) have integrated images, but adapting them to graphs remains challenging.
We introduce Llamole, the first multimodal LLM capable of interleaved text and graph generation.
Llamole significantly outperforms 14 adapted LLMs across 12 metrics for controllable molecular design and retrosynthetic planning.
arXiv Detail & Related papers (2024-10-05T16:35:32Z) - Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks.
We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models.
We present four brick-oriented operations: retrieval and routing, merging, updating, and growing.
We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z) - Many-Shot In-Context Learning for Molecular Inverse Design [56.65345962071059]
Large Language Models (LLMs) have demonstrated great performance in few-shot In-Context Learning (ICL)
We develop a new semi-supervised learning method that overcomes the lack of experimental data available for many-shot ICL.
As we show, the new method greatly improves upon existing ICL methods for molecular design while being accessible and easy to use for scientists.
arXiv Detail & Related papers (2024-07-26T21:10:50Z) - Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation [59.37775534633868]
We present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs.
We also propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity.
arXiv Detail & Related papers (2024-03-27T17:50:00Z) - Model Composition for Multimodal Large Language Models [71.5729418523411]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z) - MolTC: Towards Molecular Relational Modeling In Language Models [28.960416816491392]
We propose a novel framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory termed MolTC.
Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines.
arXiv Detail & Related papers (2024-02-06T07:51:56Z) - Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and
Text Integration [50.94902442781148]
We propose a novel multi-modal large language model (LLM) that seamlessly integrates visual, audio, and textual information.
Macaw-LLM consists of three main components: a modality module for encoding multi-modal data, a cognitive module for harnessing pretrained LLMs, and an alignment module for harmonizing diverse representations.
We construct a large-scale multi-modal instruction dataset in terms of multi-turn dialogue, including 69K image instances and 50K video instances.
arXiv Detail & Related papers (2023-06-15T12:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.