Related papers: SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

URL: http://arxiv.org/abs/2601.12805v2
Date: Wed, 21 Jan 2026 05:31:52 GMT
Title: SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding
Authors: Xiaohan Huang, Meng Xiao, Chuan Qin, Qingqing Long, Jinmiao Chen, Yuanchun Zhou, Hengshu Zhu,
Abstract summary: Large language models (LLMs) have shown growing promise in biomedical research, particularly for knowledge-driven interpretation tasks.<n>We introduce SciHorizon-GENE, a large-scale gene-centric benchmark constructed from authoritative biological databases.<n>The benchmark integrates curated knowledge for over 190K human genes and comprises more than 540K questions covering diverse gene-to-function reasoning scenarios.
Score: 30.790301729371475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have shown growing promise in biomedical research, particularly for knowledge-driven interpretation tasks. However, their ability to reliably reason from gene-level knowledge to functional understanding, a core requirement for knowledge-enhanced cell atlas interpretation, remains largely underexplored. To address this gap, we introduce SciHorizon-GENE, a large-scale gene-centric benchmark constructed from authoritative biological databases. The benchmark integrates curated knowledge for over 190K human genes and comprises more than 540K questions covering diverse gene-to-function reasoning scenarios relevant to cell type annotation, functional interpretation, and mechanism-oriented analysis. Motivated by behavioral patterns observed in preliminary examinations, SciHorizon-GENE evaluates LLMs along four biologically critical perspectives: research attention sensitivity, hallucination tendency, answer completeness, and literature influence, explicitly targeting failure modes that limit the safe adoption of LLMs in biological interpretation pipelines. We systematically evaluate a wide range of state-of-the-art general-purpose and biomedical LLMs, revealing substantial heterogeneity in gene-level reasoning capabilities and persistent challenges in generating faithful, complete, and literature-grounded functional interpretations. Our benchmark establishes a systematic foundation for analyzing LLM behavior at the gene scale and offers insights for model selection and development, with direct relevance to knowledge-enhanced biological interpretation.

Related papers

SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation [24.956743572453153]
We present SC-ARENA, a natural language evaluation framework tailored to single-cell foundation models.<n>SC-ARENA formalizes a virtual cell abstraction that unifies evaluation targets by representing both intrinsic attributes and gene-level interactions.
arXiv Detail & Related papers (2026-02-26T16:50:28Z)
BABE: Biology Arena BEnchmark [51.53220868983288]
BABE is a benchmark designed to evaluate the experimental reasoning capabilities of biological AI systems.<n>Our benchmark provides a robust framework for assessing how well AI systems can reason like practicing scientists.
arXiv Detail & Related papers (2026-02-05T16:39:20Z)
Contrastive Learning Enhances Language Model Based Cell Embeddings for Low-Sample Single Cell Transcriptomics [3.7907528918903797]
Large language models (LLMs) have shown ability in generating rich representations across domains such as natural language processing and generation, computer vision, and multimodal learning.<n>We present a computational framework that integrates single-cell RNA sequencing (scRNA-seq) with LLMs to derive knowledge-informed gene embeddings.
arXiv Detail & Related papers (2025-09-28T00:45:39Z)
GenOM: Ontology Matching with Description Generation and Large Language Model [19.917106654694894]
This paper introduces GenOM, a large language model (LLM)-based ontology alignment framework.<n>Experiments conducted on the OAEI Bio-ML track demonstrate that GenOM can often achieve competitive performance.
arXiv Detail & Related papers (2025-08-14T14:48:09Z)
Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation [77.10390725623125]
retrieval-augmented generation (RAG) is widely employed to expand their knowledge scope.<n>Since RAG has shown promise in knowledge-intensive tasks like open-domain question answering, its broader application to complex tasks and intelligent assistants has further advanced its utility.<n>We present a systematic investigation of the intrinsic mechanisms by which RAGs integrate internal (parametric) and external (retrieved) knowledge.
arXiv Detail & Related papers (2025-05-17T13:13:13Z)
CellVerse: Do Large Language Models Really Understand Cell Biology? [74.34984441715517]
We introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data.<n>We systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse.
arXiv Detail & Related papers (2025-05-09T06:47:23Z)
Contextualizing biological perturbation experiments through language [3.704686482174365]
PerturbQA is a benchmark for structured reasoning over perturbation experiments.<n>We evaluate state-of-the-art machine learning and statistical approaches for modeling perturbations.<n>As a proof of feasibility, we introduce Summer (SUMMarize, retrievE, and answeR), a simple, domain-informed LLM framework.
arXiv Detail & Related papers (2025-02-28T18:15:31Z)
BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning [49.487327661584686]
We introduce BioMaze, a dataset with 5.1K complex pathway problems from real research.<n>Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning.<n>To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation.
arXiv Detail & Related papers (2025-02-23T17:38:10Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
Genomic Language Models: Opportunities and Challenges [0.2912705470788796]
Genomic Language Models (gLMs) have the potential to significantly advance our understanding of genomes. We highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. We discuss major considerations for developing and evaluating gLMs.
arXiv Detail & Related papers (2024-07-16T06:57:35Z)
Understanding Biology in the Age of Artificial Intelligence [4.299566787216408]
Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems. Although machine learning (ML) models are useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. Here, we identify general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge.
arXiv Detail & Related papers (2024-03-06T23:20:34Z)
Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology. We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.