Related papers: Elucidating the Design Space of Multimodal Protein Language Models

Elucidating the Design Space of Multimodal Protein Language Models

URL: http://arxiv.org/abs/2504.11454v2
Date: Wed, 16 Apr 2025 02:35:11 GMT
Title: Elucidating the Design Space of Multimodal Protein Language Models
Authors: Cheng-Yen Hsieh, Xinyou Wang, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, Quanquan Gu,
Abstract summary: Multimodal protein language models (PLMs) integrate sequence and token-based structural information.<n>This paper systematically elucidates the design space of multimodal PLMs to overcome their limitations.<n>Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.
Score: 69.3650883370033
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks. To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling. The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5.52 to 2.36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models.

Related papers

SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification [2.9239817922453333]
This study introduces SDIGLM, an innovative multi-modal structural damage identification model. By leveraging this multi-modal CoT, SDIGLM surpasses general-purpose LMMs in structural damage identification, achieving an accuracy of 95.24% across various infrastructure types.
arXiv Detail & Related papers (2025-04-12T11:37:10Z)
Aligning Large Language Models and Geometric Deep Models for Protein Representation [57.59506688299817]
Latent representation alignment is used to map embeddings from different modalities into a shared space, often aligned with the embedding space of large language models (LLMs)<n>Preliminary protein-focused large language models (MLLMs) have emerged, but they have predominantly relied on approaches lacking a fundamental understanding of optimal alignment practices across representations.<n>In this study, we explore the alignment of multimodal representations between LLMs and Geometric Deep Models (GDMs) in the protein domain.<n>Our work examines alignment factors from both model and protein perspectives, identifying challenges in current alignment methodologies and proposing strategies to improve the alignment process.
arXiv Detail & Related papers (2024-11-08T04:15:08Z)
DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures. DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals. Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z)
Cliqueformer: Model-Based Optimization with Structured Transformers [102.55764949282906]
Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems. We present Cliqueformer, a transformer-based architecture that learns the black-box function's structure through functional graphical models (FGM) Across various domains, including chemical and genetic design tasks, Cliqueformer demonstrates superior performance compared to existing methods.
arXiv Detail & Related papers (2024-10-17T00:35:47Z)
Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP) is a methodology designed to analyse how large language models process linguistic structures.<n>CAP intervenes in model activations through constituent-based pooling at various model levels.<n>Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability.
arXiv Detail & Related papers (2024-10-16T18:10:50Z)
A Large Language Model and Denoising Diffusion Framework for Targeted Design of Microstructures with Commands in Natural Language [0.0]
We propose a framework that integrates Natural Language Processing (NLP), Large Language Models (LLMs), and Denoising Diffusion Probabilistic Models (DDPMs) Our framework employs contextual data augmentation, driven by a pretrained LLM, to generate and expand a diverse dataset of microstructure descriptors. A retrained NER model extracts relevant microstructure descriptors from user-provided natural language inputs, which are then used by the DDPM to generate microstructures with targeted mechanical properties and topological features.
arXiv Detail & Related papers (2024-09-22T14:45:22Z)
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling [41.07090635630771]
We propose textbf3D-MolT5, a unified framework designed to model molecule in both sequence and 3D structure spaces.<n>Key innovation of our approach lies in mapping fine-grained 3D substructure representations into a specialized 3D token vocabulary.<n>Our approach significantly improves cross-modal interaction and alignment, addressing key challenges in previous work.
arXiv Detail & Related papers (2024-06-09T14:20:55Z)
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [49.10029030628653]
Large language models' (LLMs) ability to process structured data lags behind state-of-the-art (SoTA) model by an average of 35%. We train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks.
arXiv Detail & Related papers (2024-02-26T15:47:01Z)
Role of Structural and Conformational Diversity for Machine Learning Potentials [4.608732256350959]
We investigate the relationship between data biases and model generalization in Quantum Mechanics. Our results reveal nuanced patterns in generalization metrics. These findings provide valuable insights and guidelines for QM data generation efforts.
arXiv Detail & Related papers (2023-10-30T19:33:12Z)
Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs. Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z)
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models [68.9288651177564]
We present a novel MoE architecture based on matrix product operators (MPO) from quantum many-body physics. With the decomposed MPO structure, we can reduce the parameters of the original MoE architecture. Experiments on the three well-known downstream natural language datasets based on GPT2 show improved performance and efficiency in increasing model capacity.
arXiv Detail & Related papers (2022-03-02T13:44:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.