Related papers: Auxiliary Gene Learning: Spatial Gene Expression Estimation by Auxiliary Gene Selection

Auxiliary Gene Learning: Spatial Gene Expression Estimation by Auxiliary Gene Selection

URL: http://arxiv.org/abs/2511.18336v1
Date: Sun, 23 Nov 2025 08:22:20 GMT
Title: Auxiliary Gene Learning: Spatial Gene Expression Estimation by Auxiliary Gene Selection
Authors: Kaito Shiku, Kazuya Nishimura, Shinnosuke Matsuo, Yasuhiro Kojima, Ryoma Bise,
Abstract summary: We propose $Auxiliary Gene Learning$ (AGL) that utilizes the benefit of the ignored genes by reformulating their expression estimation as auxiliary tasks.<n>To effectively leverage auxiliary genes, we must select a subset of auxiliary genes that positively influence the prediction of the target genes.<n>The experiments confirm the effectiveness of incorporating auxiliary genes and show that the proposed method outperforms conventional auxiliary task learning approaches.
Score: 7.959841510571622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial transcriptomics (ST) is a novel technology that enables the observation of gene expression at the resolution of individual spots within pathological tissues. ST quantifies the expression of tens of thousands of genes in a tissue section; however, heavy observational noise is often introduced during measurement. In prior studies, to ensure meaningful assessment, both training and evaluation have been restricted to only a small subset of highly variable genes, and genes outside this subset have also been excluded from the training process. However, since there are likely co-expression relationships between genes, low-expression genes may still contribute to the estimation of the evaluation target. In this paper, we propose $Auxiliary \ Gene \ Learning$ (AGL) that utilizes the benefit of the ignored genes by reformulating their expression estimation as auxiliary tasks and training them jointly with the primary tasks. To effectively leverage auxiliary genes, we must select a subset of auxiliary genes that positively influence the prediction of the target genes. However, this is a challenging optimization problem due to the vast number of possible combinations. To overcome this challenge, we propose Prior-Knowledge-Based Differentiable Top-$k$ Gene Selection via Bi-level Optimization (DkGSB), a method that ranks genes by leveraging prior knowledge and relaxes the combinatorial selection problem into a differentiable top-$k$ selection problem. The experiments confirm the effectiveness of incorporating auxiliary genes and show that the proposed method outperforms conventional auxiliary task learning approaches.

Related papers

Beyond Independent Genes: Learning Module-Inductive Representations for Gene Perturbation Prediction [48.80217316452559]
scBIG is a module-inductive prediction framework that explicitly models coordinated gene programs.<n> scBIG consistently outperforms state-of-the-art methods, particularly on unseen and perturbation settings.
arXiv Detail & Related papers (2026-02-03T16:43:40Z)
GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z)
BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification [0.0937465283958018]
BOLIMES is a novel feature selection algorithm designed to enhance gene expression classification.<n>It combines exhaustive feature selection with interpretability-driven refinement, offering a powerful solution for high-dimensional gene expression analysis.
arXiv Detail & Related papers (2025-02-18T17:33:41Z)
Survey and Improvement Strategies for Gene Prioritization with Large Language Models [61.24568051916653]
Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed.<n>We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels.<n>At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
arXiv Detail & Related papers (2025-01-30T23:03:03Z)
GeneQuery: A General QA-based Framework for Spatial Gene Expression Predictions from Histology Images [41.732831871866516]
Whole-slide hematoxylin and eosin stained histological images are readily accessible and allow for detailed examinations of tissue structure and composition at the microscopic level.<n>Recent advancements have utilized these histological images to predict spatially resolved gene expression profiles.<n>GeneQuery aims to solve this gene expression prediction task in a question-answering (QA) manner for better generality and flexibility.
arXiv Detail & Related papers (2024-11-27T14:33:13Z)
Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization [16.491060073775884]
We introduce an iterative gene panel selection strategy applicable to clustering tasks in single-cell genomics. Our method integrates results from other gene selection algorithms, providing valuable preliminary boundaries. We incorporate the nature of the exploration process in reinforcement learning (RL) and its capability for continuous optimization.
arXiv Detail & Related papers (2024-06-11T16:21:33Z)
Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z)
Efficient and Scalable Fine-Tune of Language Models for Genome Understanding [49.606093223945734]
We present textscLingo: textscLanguage prefix ftextscIne-tuning for textscGentextscOmes. Unlike DNA foundation models, textscLingo strategically leverages natural language foundation models' contextual cues. textscLingo further accommodates numerous downstream fine-tune tasks by an adaptive rank sampling method.
arXiv Detail & Related papers (2024-02-12T21:40:45Z)
Single-Cell Deep Clustering Method Assisted by Exogenous Gene Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells. During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation. This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z)
Epigenetics Algorithms: Self-Reinforcement-Attention mechanism to regulate chromosomes expression [0.0]
This paper proposes a new epigenetics algorithm that mimics the epigenetics phenomenon known as methylation. The novelty of our epigenetics algorithms lies primarily in taking advantage of attention mechanisms and deep learning, which fits well with the genes/silencing concept.
arXiv Detail & Related papers (2023-03-15T21:33:21Z)
Unsupervised ensemble-based phenotyping helps enhance the discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles. It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner. These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.