Transcending the Annotation Bottleneck: AI-Powered Discovery in Biology and Medicine
- URL: http://arxiv.org/abs/2602.20100v1
- Date: Mon, 23 Feb 2026 18:15:30 GMT
- Title: Transcending the Annotation Bottleneck: AI-Powered Discovery in Biology and Medicine
- Authors: Soumick Chatterjee,
- Abstract summary: Self-supervised learning is currently unlocking the latent potential of biobank-scale datasets.<n>This article synthesises seminal and recent advances in "learning without labels"<n>Highlights how unsupervised frameworks can derive heritable cardiac traits, predict spatial gene expression in histology, and detect pathologies with performance that rivals or exceeds supervised counterparts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supervised learning drove the initial wave of clinical algorithms, a paradigm shift towards unsupervised and self-supervised learning (SSL) is currently unlocking the latent potential of biobank-scale datasets. By learning directly from the intrinsic structure of data - whether pixels in a magnetic resonance image (MRI), voxels in a volumetric scan, or tokens in a genomic sequence - these methods facilitate the discovery of novel phenotypes, the linkage of morphology to genetics, and the detection of anomalies without human bias. This article synthesises seminal and recent advances in "learning without labels," highlighting how unsupervised frameworks can derive heritable cardiac traits, predict spatial gene expression in histology, and detect pathologies with performance that rivals or exceeds supervised counterparts.
Related papers
- Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs [78.18336140706471]
Sci-LLMs have emerged as a promising frontier for accelerating biological discovery.<n>Current strategies limit Sci-LLMs' reasoning capacity when processing raw biomolecular sequences.<n>We show that a more effective strategy is to provide Sci-LLMs with high-level structured context.
arXiv Detail & Related papers (2025-10-27T09:03:21Z) - Self-Supervised Cross-Encoder for Neurodegenerative Disease Diagnosis [6.226851122403944]
We propose a novel self-supervised cross-encoder framework that leverages the temporal continuity in longitudinal MRI scans for supervision.<n>This framework disentangles learned representations into two components: a static representation, constrained by contrastive learning, which captures stable anatomical features; and a dynamic representation, guided by input-gradient regularization, which reflects temporal changes.<n> Experimental results on the Alzheimer's Disease Neuroimaging Initiative dataset demonstrate that our method achieves superior classification accuracy and improved interpretability.
arXiv Detail & Related papers (2025-09-09T11:52:24Z) - E-ABIN: an Explainable module for Anomaly detection in BIological Networks [1.7999333451993955]
E-ABIN is a general-purpose, explainable framework for Anomaly detection in Biological Networks.<n>It combines classical machine learning and graph-based deep learning techniques within a unified, user-friendly platform.<n>We demonstrate the utility of E-ABIN through case studies of bladder cancer and coeliac disease.
arXiv Detail & Related papers (2025-06-25T08:25:17Z) - Identifying Critical Phases for Disease Onset with Sparse Haematological Biomarkers [0.0]
Clinical blood tests are an emerging molecular data source for large-scale biomedical research.<n>Traditional imputation approaches distort learning signals and bias predictions while lacking biological interpretability.<n>We propose a novel methodology using Graph Neural Additive Networks (GNAN) to model delta biomarker trajectories.
arXiv Detail & Related papers (2025-03-18T07:29:45Z) - Knowledge-Guided Biomarker Identification for Label-Free Single-Cell RNA-Seq Data: A Reinforcement Learning Perspective [30.927272289309048]
We present an iterative gene panel selection strategy that harnesses ensemble knowledge from existing gene selection algorithms to establish preliminary boundaries or prior knowledge.<n>We incorporate reinforcement learning through a reward function shaped by expert behavior, enabling dynamic refinement and targeted selection of gene panels.<n>Our results underscore the potential of this approach to advance single-cell genomics data analysis.
arXiv Detail & Related papers (2025-01-02T07:57:41Z) - Spatial Temporal Graph Convolution with Graph Structure Self-learning
for Early MCI Detection [9.11430195887347]
We propose a spatial temporal graph convolutional network with a novel graph structure self-learning mechanism for EMCI detection.
Results on the Alzheimer's Disease Neuroimaging Initiative database show that our method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-11T12:29:00Z) - fMRI from EEG is only Deep Learning away: the use of interpretable DL to
unravel EEG-fMRI relationships [68.8204255655161]
We present an interpretable domain grounded solution to recover the activity of several subcortical regions from multichannel EEG data.
We recover individual spatial and time-frequency patterns of scalp EEG predictive of the hemodynamic signal in the subcortical nuclei.
arXiv Detail & Related papers (2022-10-23T15:11:37Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Deep Metric Learning with Locality Sensitive Angular Loss for
Self-Correcting Source Separation of Neural Spiking Signals [77.34726150561087]
We propose a methodology based on deep metric learning to address the need for automated post-hoc cleaning and robust separation filters.
We validate this method with an artificially corrupted label set based on source-separated high-density surface electromyography recordings.
This approach enables a neural network to learn to accurately decode neurophysiological time series using any imperfect method of labelling the signal.
arXiv Detail & Related papers (2021-10-13T21:51:56Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.