Related papers: OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models

OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models

URL: http://arxiv.org/abs/2410.01784v1
Date: Wed, 2 Oct 2024 17:40:44 GMT
Title: OmniGenBench: Automating Large-scale in-silico Benchmarking for Genomic Foundation Models
Authors: Heng Yang, Jack Cole, Ke Li,
Abstract summary: We introduce GFMBench, a framework dedicated to genomic foundation models (GFMs) benchmarking. It integrates millions of genomic sequences across hundreds of genomic tasks from four large-scale benchmarks. GFMBench is released as open-source software, offering user-friendly interfaces and diverse tutorials.
Score: 6.781852451887055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advancements in artificial intelligence in recent years, such as Large Language Models (LLMs), have fueled expectations for breakthroughs in genomic foundation models (GFMs). The code of nature, hidden in diverse genomes since the very beginning of life's evolution, holds immense potential for impacting humans and ecosystems through genome modeling. Recent breakthroughs in GFMs, such as Evo, have attracted significant investment and attention to genomic modeling, as they address long-standing challenges and transform in-silico genomic studies into automated, reliable, and efficient paradigms. In the context of this flourishing era of consecutive technological revolutions in genomics, GFM studies face two major challenges: the lack of GFM benchmarking tools and the absence of open-source software for diverse genomics. These challenges hinder the rapid evolution of GFMs and their wide application in tasks such as understanding and synthesizing genomes, problems that have persisted for decades. To address these challenges, we introduce GFMBench, a framework dedicated to GFM-oriented benchmarking. GFMBench standardizes benchmark suites and automates benchmarking for a wide range of open-source GFMs. It integrates millions of genomic sequences across hundreds of genomic tasks from four large-scale benchmarks, democratizing GFMs for a wide range of in-silico genomic applications. Additionally, GFMBench is released as open-source software, offering user-friendly interfaces and diverse tutorials, applicable for AutoBench and complex tasks like RNA design and structure prediction. To facilitate further advancements in genome modeling, we have launched a public leaderboard showcasing the benchmark performance derived from AutoBench. GFMBench represents a step toward standardizing GFM benchmarking and democratizing GFM applications.

Related papers

SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models [8.019763193322298]
We propose SafeGenes: a framework for Secure analysis of genomic foundation models.<n>We assess the adversarial vulnerabilities of GFMs using two approaches: the Fast Gradient Sign Method and a soft prompt attack.<n>Targeted soft prompt attacks led to substantial performance degradation, even in large models such as ESM1b and ESM1v.
arXiv Detail & Related papers (2025-06-01T03:54:03Z)
OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking [21.177773831820673]
Genomic Foundation Models (GFMs) have emerged as a transformative approach to decoding the genome.<n>As GFMs scale up and reshape the landscape of AI-driven genomics, the field faces an urgent need for rigorous and reproducible evaluation.<n>We present OmniGenBench, a modular benchmarking platform designed to unify the data, model, benchmarking, and interpretability layers across GFMs.
arXiv Detail & Related papers (2025-05-20T14:16:25Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation [84.41557981816077]
We introduce GFM-RAG, a novel graph foundation model (GFM) for retrieval augmented generation. GFM-RAG is powered by an innovative graph neural network that reasons over graph structure to capture complex query-knowledge relationships. It achieves state-of-the-art performance while maintaining efficiency and alignment with neural scaling laws.
arXiv Detail & Related papers (2025-02-03T07:04:29Z)
GDM4MMIMO: Generative Diffusion Models for Massive MIMO Communications [61.56610953012228]
generative diffusion model (GDM) is one of state-of-the-art families of generative models. GDM demonstrates exceptional capability to learn implicit prior knowledge and robust generalization capabilities. Case study shows GDM's promising potential for facilitating efficient ultra-dimensional channel statement information acquisition.
arXiv Detail & Related papers (2024-12-24T08:42:01Z)
PANGAEA: A Global and Inclusive Benchmark for Geospatial Foundation Models [3.9031647202359667]
PANGAEA is a standardized evaluation protocol that covers a diverse set of datasets, tasks, resolutions, sensor modalities, and temporalities. We evaluate the most popular GFMs openly available on this benchmark and analyze their performance across several domains. Our findings highlight the limitations of GFMs, under different scenarios, showing that they do not consistently outperform supervised models.
arXiv Detail & Related papers (2024-12-05T14:40:41Z)
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities. Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z)
dnaGrinder: a lightweight and high-capacity genomic foundation model [11.646351318648499]
Current genomic foundation models often face a critical tradeoff: smaller models with mediocre performance versus large models with improved performance. We introduce dnaGrinder, a unique and efficient genomic foundation model. dnaGrinder excels at managing long-range dependencies within genomic sequences while minimizing computational costs without compromising performance.
arXiv Detail & Related papers (2024-09-24T03:20:07Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
Trackable Agent-based Evolution Models at Wafer Scale [0.0]
We focus on the problem of extracting phylogenetic information from agent-based evolution on the 850,000 processor Cerebras Wafer Scale Engine (WSE) We present an asynchronous island-based genetic algorithm (GA) framework for WSE hardware. We validate phylogenetic reconstructions from these trials and demonstrate their suitability for inference of underlying evolutionary conditions.
arXiv Detail & Related papers (2024-04-16T19:24:14Z)
Cancer-inspired Genomics Mapper Model for the Generation of Synthetic DNA Sequences with Desired Genomics Signatures [0.0]
Cancer-inspired genomics mapper model (CGMM) combines genetic algorithm (GA) and deep learning (DL) methods. We demonstrate that CGMM can generate synthetic genomes of selected phenotypes such as ancestry and cancer.
arXiv Detail & Related papers (2023-05-01T07:16:40Z)
Generalized Visual Quality Assessment of GAN-Generated Face Images [79.47386781978531]
We study the subjective and objective quality towards generalized quality assessment of GAN-generated face images (GFIs) We develop a quality assessment model that is able to deliver accurate quality predictions for GFIs from both available and unseen GAN algorithms.
arXiv Detail & Related papers (2022-01-28T07:54:49Z)
Result Diversification by Multi-objective Evolutionary Algorithms with Theoretical Guarantees [94.72461292387146]
We propose to reformulate the result diversification problem as a bi-objective search problem, and solve it by a multi-objective evolutionary algorithm (EA) We theoretically prove that the GSEMO can achieve the optimal-time approximation ratio, $1/2$. When the objective function changes dynamically, the GSEMO can maintain this approximation ratio in running time, addressing the open question proposed by Borodin et al.
arXiv Detail & Related papers (2021-10-18T14:00:22Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.