Related papers: Fast and Low-Cost Genomic Foundation Models via Outlier Removal

Fast and Low-Cost Genomic Foundation Models via Outlier Removal

URL: http://arxiv.org/abs/2505.00598v2
Date: Fri, 02 May 2025 09:34:03 GMT
Title: Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Authors: Haozheng Luo, Chenghao Qiu, Maojiang Su, Zhihan Zhou, Zoe Mehta, Guo Ye, Jerry Yao-Chieh Hu, Han Liu,
Abstract summary: We introduce GERM, a genomic foundation model with strong compression performance and fast adaptability.<n>We replace the vanilla attention layer with an outlier-free mechanism inspired by associative memory models.<n>We propose GERM-T, a strategy that employs small-step continual learning within the outlier-free framework.
Score: 6.493357255196893
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To address the challenge of scarce computational resources in genomic modeling, we introduce GERM, a genomic foundation model with strong compression performance and fast adaptability. GERM improves upon models like DNABERT-2 by eliminating outliers that hinder low-rank adaptation and post-training quantization, enhancing both efficiency and robustness. We replace the vanilla attention layer with an outlier-free mechanism inspired by associative memory models. By removing outliers during both pre-training and fine-tuning, this approach accelerates adaptation, reduces computational costs, and enhances quantization robustness within acceptable loss margins. Additionally, we propose GERM-T, a strategy that employs small-step continual learning within the outlier-free framework, leveraging original checkpoints to avoid retraining from scratch. Empirically, GERM improves fine-tuning performance by 37.98% and quantization by 64.34% over the baseline model. It also reduces average kurtosis by 92.14% and maximum infinity norm by 82.77%. Compared to leading methods, GERM consistently delivers superior performance, offering a practical solution for genomic modeling in resource-constrained settings. Code is available at https://github.com/MAGICS-LAB/GERM.

Related papers

GRAM-MAMBA: Holistic Feature Alignment for Wireless Perception with Adaptive Low-Rank Compensation [8.217823995127201]
Multi-modal fusion is crucial for Internet of Things (IoT) perception, widely deployed in smart homes, intelligent transport, industrial automation, and healthcare.<n>Existing systems often face challenges: high model complexity hinders deployment in resource-constrained environments, unidirectional modal alignment neglects inter-modal relationships, and robustness suffers when sensor data is missing.<n>We propose GRAM-MAMBA, which utilizes the linear-complexity Mamba model for efficient sensor time-series processing, combined with an optimized GRAM matrix strategy for pairwise alignment among modalities.
arXiv Detail & Related papers (2025-07-18T10:30:37Z)
Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models. We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness. The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z)
Entropy-Based Adaptive Weighting for Self-Training [15.089334734753677]
We propose Entropy-Based Adaptive Weighting for Self-Training (EAST)<n>EAST is an adaptive weighting strategy designed to prioritize uncertain data during self-training.<n>We evaluate our approach on GSM8K and MATH benchmarks.
arXiv Detail & Related papers (2025-03-31T10:04:35Z)
MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.47014540413659]
We propose a unified training framework for deep neural networks.<n>We introduce three instances of MARS that leverage preconditioned gradient optimization.<n>Results indicate that the implementation of MARS consistently outperforms Adam.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
Game-Theoretic Defenses for Robust Conformal Prediction Against Adversarial Attacks in Medical Imaging [12.644923600594176]
Adversarial attacks pose significant threats to the reliability and safety of deep learning models.<n>This paper introduces a novel framework that integrates conformal prediction with game-theoretic defensive strategies.
arXiv Detail & Related papers (2024-11-07T02:20:04Z)
Self-Data Distillation for Recovering Quality in Pruned Large Language Models [1.5665059604715017]
One-shot pruning results in significant quality degradation, particularly in tasks requiring multi-step reasoning.<n>To recover lost quality, supervised fine-tuning (SFT) is commonly applied, but it can lead to catastrophic forgetting.<n>In this work, we utilize self-data distilled fine-tuning to address these challenges.
arXiv Detail & Related papers (2024-10-13T19:53:40Z)
MIBench: A Comprehensive Framework for Benchmarking Model Inversion Attack and Defense [42.56467639172508]
Model Inversion (MI) attacks aim at leveraging the output information of target models to reconstruct privacy-sensitive training data.<n>We build the first practical benchmark named MIBench for systematic evaluation of model inversion attacks and defenses.
arXiv Detail & Related papers (2024-10-07T16:13:49Z)
GRIN: GRadient-INformed MoE [132.87651078514122]
Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing. We introduce GRIN (GRadient-INformed MoE training), which incorporates sparse gradient estimation for expert routing. Our model, with only 6.6B activated parameters, outperforms a 7B dense model and matches the performance of a 14B dense model trained on the same data.
arXiv Detail & Related papers (2024-09-18T17:00:20Z)
Semi-Supervised Reward Modeling via Iterative Self-Training [52.48668920483908]
We propose Semi-Supervised Reward Modeling (SSRM), an approach that enhances RM training using unlabeled data. We demonstrate that SSRM significantly improves reward models without incurring additional labeling costs. Overall, SSRM substantially reduces the dependency on large volumes of human-annotated data, thereby decreasing the overall cost and time involved in training effective reward models.
arXiv Detail & Related papers (2024-09-10T22:57:58Z)
Correlation Analysis of Adversarial Attack in Time Series Classification [6.117704456424016]
This study investigates the vulnerability of time series classification models to adversarial attacks. Regularization techniques and noise introduction are shown to enhance the effectiveness of attacks. Models designed to prioritize global information are revealed to possess greater resistance to adversarial manipulations.
arXiv Detail & Related papers (2024-08-21T01:11:32Z)
Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks [0.0]
Adversarial attacks pose significant threats to the robustness of deep learning models in image classification. This paper explores and refines defense mechanisms against these attacks to enhance the resilience of neural networks.
arXiv Detail & Related papers (2024-08-20T02:00:02Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin) We propose PUMA, a new data pruning strategy that computes the margin using DeepFool. We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z)
WARM: On the Benefits of Weight Averaged Reward Models [63.08179139233774]
We propose Weight Averaged Reward Models (WARM) to mitigate reward hacking. Experiments on summarization tasks, using best-of-N and RL methods, shows that WARM improves the overall quality and alignment of LLM predictions.
arXiv Detail & Related papers (2024-01-22T18:27:08Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)
Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance [6.599368083393398]
Gradient (SGD) is often limited to converging local optima to improve model performance. em soft merging method minimizes the obtained local optima models in undesirable results. Experiments underscore the effectiveness of the merged networks.
arXiv Detail & Related papers (2023-09-21T17:07:31Z)
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models. We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench. GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z)
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network. We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z)
Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the in adversarial attacks parameterized by a recurrent neural network. We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.