Related papers: LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization

LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization

URL: http://arxiv.org/abs/2505.05893v1
Date: Fri, 09 May 2025 09:01:10 GMT
Title: LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Authors: Seunghee Han, Soongyu Choi, Joo-Young Kim,
Abstract summary: We present LightNobel, the first hardware-software co-designed accelerator to overcome scalability limitations on the sequence length in Protein Structure Prediction Models (PPMs)<n>At the software level, we propose Token-wise Adaptive Activation Quantization (AAQ) to enable fine-grained quantization techniques without compromising accuracy.<n>At the hardware level, LightNobel integrates the multi-precision reconfigurable matrix processing unit (RMPU) and versatile vector processing unit (VVPU) to enable the efficient execution of AAQ.
Score: 0.7373617024876725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Protein Structure Prediction Models (PPMs), such as AlphaFold2 and ESMFold, have revolutionized computational biology by achieving unprecedented accuracy in predicting three-dimensional protein folding structures. However, these models face significant scalability challenges, particularly when processing proteins with long amino acid sequences (e.g., sequence length > 1,000). The primary bottleneck that arises from the exponential growth in activation sizes is driven by the unique data structure in PPM, which introduces an additional dimension that leads to substantial memory and computational demands. These limitations have hindered the effective scaling of PPM for real-world applications, such as analyzing large proteins or complex multimers with critical biological and pharmaceutical relevance. In this paper, we present LightNobel, the first hardware-software co-designed accelerator developed to overcome scalability limitations on the sequence length in PPM. At the software level, we propose Token-wise Adaptive Activation Quantization (AAQ), which leverages unique token-wise characteristics, such as distogram patterns in PPM activations, to enable fine-grained quantization techniques without compromising accuracy. At the hardware level, LightNobel integrates the multi-precision reconfigurable matrix processing unit (RMPU) and versatile vector processing unit (VVPU) to enable the efficient execution of AAQ. Through these innovations, LightNobel achieves up to 8.44x, 8.41x speedup and 37.29x, 43.35x higher power efficiency over the latest NVIDIA A100 and H100 GPUs, respectively, while maintaining negligible accuracy loss. It also reduces the peak memory requirement up to 120.05x in PPM, enabling scalable processing for proteins with long sequences.

Related papers

SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers [50.18388227899971]
We present SaDiT, a novel framework that accelerates protein backbone generation by integrating SaProt Tokenization with a Diffusion Transformer (DiT) architecture.<n>Experiments demonstrate that SaDiT outperforms state-of-the-art models, including RFDiffusion and Proteina, in both computational speed and structural viability.
arXiv Detail & Related papers (2026-02-06T13:50:13Z)
Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design [61.2846583160056]
Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains.<n>This is in part because high-quality annotated data are far more difficult to obtain for proteins than for natural language.<n>We present a simple and general recipe for fast SFT of PLMs, designed to improve the fidelity, reliability, and novelty of generated protein sequences.
arXiv Detail & Related papers (2025-12-10T05:34:47Z)
Triangle Multiplication Is All You Need For Biomolecular Structure Representations [56.26342479807906]
We introduce Pairmixer, a streamlined alternative that eliminates triangle attention while preserving higher-order geometric reasoning capabilities.<n>Pairmixer substantially improves computational efficiency, matching state-of-the-art structure predictors across folding and docking benchmarks.<n>Within BoltzDesign, for example, Pairmixer delivers over 2x faster sampling and scales to sequences 30% longer than the memory limits of Pairformer.
arXiv Detail & Related papers (2025-10-21T17:59:02Z)
Protenix-Mini+: efficient structure prediction model with scalable pairformer [17.839471210239186]
Protenix-Mini+ is a highly lightweight and scalable variant of the Protenix model.<n>Within an acceptable range of performance degradation, it substantially improves computational efficiency.
arXiv Detail & Related papers (2025-10-13T21:54:35Z)
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding [64.77182442408254]
We introduce ProteinAE, a novel and streamlined protein diffusion autoencoder.<n>ProteinAE directly maps protein backbone coordinates from E(3) into a continuous, compact latent space.<n>We demonstrate that ProteinAE achieves state-of-the-art reconstruction quality, outperforming existing autoencoders.
arXiv Detail & Related papers (2025-10-12T14:30:32Z)
Reparameterized LLM Training via Orthogonal Equivalence Transformation [54.80172809738605]
We present POET, a novel training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons.<n>POET can stably optimize the objective function with improved generalization.<n>We develop efficient approximations that make POET flexible and scalable for training large-scale neural networks.
arXiv Detail & Related papers (2025-06-09T17:59:34Z)
ProteinZero: Self-Improving Protein Generation via Online Reinforcement Learning [49.2607661375311]
We present ProteinZero, a novel framework that enables computationally scalable, automated, and continuous self-improvement of the inverse folding model.<n>ProteinZero substantially outperforms existing methods across every key metric in protein design.<n> Notably, the entire RL run on CATH-4.3 can be done with a single 8 X GPU node in under 3 days, including reward.
arXiv Detail & Related papers (2025-06-09T06:08:59Z)
Prot2Token: A Unified Framework for Protein Modeling via Next-Token Prediction [19.164841536081568]
We introduce Prot2Token, a unified framework that overcomes challenges by converting a wide spectrum of protein-related predictions.<n>At its core, Prot2Token employs an autoregressive decoder, conditioned on embeddings from pre-trained protein encoders and guided by learnable task tokens.<n>We present extensive experimental validation across a variety of benchmarks, demonstrating Prot2Tokens strong predictive power in different types of protein-prediction tasks.
arXiv Detail & Related papers (2025-05-26T23:50:36Z)
FFN Fusion: Rethinking Sequential Computation in Large Language Models [16.8637819797503]
We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models.<n>We develop a principled methodology for identifying and fusing such sequences, transforming them into parallel operations.<n>Applying these techniques to Llama-3.1-405B-Instruct, we create an efficient and soon-to-be publicly available model that achieves a 1.71X speedup in inference latency and 35X lower per-token cost.
arXiv Detail & Related papers (2025-03-24T17:20:35Z)
Design and Implementation of an FPGA-Based Tiled Matrix Multiplication Accelerator for Transformer Self-Attention on the Xilinx KV260 SoM [0.0]
Transformer-based large language models rely heavily on matrix multiplications for attention and feed-forward layers.<n>We introduce a highly optimized tiled matrix multiplication accelerator on a resource-constrained Xilinx KV260 FPGA.<n>Our design exploits persistent on-chip storage, a robust two-level tiling strategy for maximal data reuse, and a systolic-like unrolled compute engine.
arXiv Detail & Related papers (2025-03-20T22:15:42Z)
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome. Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs. We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv Detail & Related papers (2024-11-04T07:14:28Z)
Fast Matrix Multiplications for Lookup Table-Quantized LLMs [58.11584672945781]
FLUTE is a flexible lookup table engine for LUT-quantized LLMs.<n>At batch sizes 32 and quantization group size of 128, the FLUTE kernel can be 2-4x faster than existing GEMM kernels.
arXiv Detail & Related papers (2024-07-15T17:55:42Z)
Token-Mol 1.0: Tokenized drug design with large language model [10.258299488278514]
Token-Mol is a token-only 3D drug design model that encodes all molecular information, including 2D and 3D structures, as well as molecular property data, into tokens. It is built on the transformer decoder architecture and trained using random causal masking techniques. Compared to existing molecular pre-trained models, Token-Mol exhibits superior proficiency in handling a wider range of downstream tasks.
arXiv Detail & Related papers (2024-07-10T07:22:15Z)
xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein [74.64101864289572]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.<n>xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.<n>It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z)
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality. On the software side, we evaluate epitomes' latency and energy on PIM accelerators. We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z)
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models [57.27101446992148]
Large language models (LLMs) have revolutionized natural language processing tasks. Recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM. We introduce an Omnidirectionally calibrated Quantization technique for LLMs, which achieves good performance in diverse quantization settings.
arXiv Detail & Related papers (2023-08-25T02:28:35Z)
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models [9.727062803700264]
We introduce LUT-GEMM, an efficient kernel for quantized matrix multiplication. LUT-GEMM eliminates the resource-intensive dequantization process and reduces computational costs. We show experimentally that when applied to the OPT-175B model with 3-bit quantization, LUT-GEMM substantially accelerates token generation latency.
arXiv Detail & Related papers (2022-06-20T03:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.