Conquering the Retina: Bringing Visual in-Context Learning to OCT
- URL: http://arxiv.org/abs/2506.15200v1
- Date: Wed, 18 Jun 2025 07:28:47 GMT
- Title: Conquering the Retina: Bringing Visual in-Context Learning to OCT
- Authors: Alessio Negrini, Simon Reiß,
- Abstract summary: In this work, we explore how to train generalist models for the domain of retinal optical coherence tomography using visual in-context learning (VICL)<n>We extensively evaluate a state-of-the-art medical VICL approach on multiple retinal OCT datasets, establishing a first baseline to highlight the potential and current limitations of in-context learning for OCT.
- Score: 5.012883033803268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in medical image analysis have led to the development of highly specialized models tailored to specific clinical tasks. These models have demonstrated exceptional performance and remain a crucial research direction. Yet, their applicability is limited to predefined tasks, requiring expertise and extensive resources for development and adaptation. In contrast, generalist models offer a different form of utility: allowing medical practitioners to define tasks on the fly without the need for task-specific model development. In this work, we explore how to train generalist models for the domain of retinal optical coherence tomography using visual in-context learning (VICL), i.e., training models to generalize across tasks based on a few examples provided at inference time. To facilitate rigorous assessment, we propose a broad evaluation protocol tailored to VICL in OCT. We extensively evaluate a state-of-the-art medical VICL approach on multiple retinal OCT datasets, establishing a first baseline to highlight the potential and current limitations of in-context learning for OCT. To foster further research and practical adoption, we openly release our code.
Related papers
- Challenging Vision-Language Models with Surgical Data: A New Dataset and Broad Benchmarking Study [0.6120768859742071]
We present the first large-scale study assessing the capabilities of Vision Language Models (VLMs) for endoscopic tasks.<n>Using a diverse set of state-of-the-art models, multiple surgical datasets, and extensive human reference annotations, we address three key research questions.<n>Our results reveal that VLMs can effectively perform basic surgical perception tasks, such as object counting and localization, with performance levels comparable to general domain tasks.
arXiv Detail & Related papers (2025-06-06T16:53:12Z) - Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook [85.43403500874889]
Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI)<n>Recent advancements in RAG for embodied AI, with a particular focus on applications in planning, task execution, multimodal perception, interaction, and specialized domains.
arXiv Detail & Related papers (2025-03-23T10:33:28Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - DiCoM -- Diverse Concept Modeling towards Enhancing Generalizability in Chest X-Ray Studies [6.83819481805979]
Chest X-Ray (CXR) is a widely used clinical imaging modality.
Self-supervised pre-training has proven to outperform supervised pre-training in numerous downstream vision tasks.
We introduce Diverse Concept Modeling (DiCoM), a novel self-supervised training paradigm.
arXiv Detail & Related papers (2024-02-22T20:51:37Z) - Foundational Models in Medical Imaging: A Comprehensive Survey and
Future Vision [6.2847894163744105]
Foundation models are large-scale, pre-trained deep-learning models adapted to a wide range of downstream tasks.
These models facilitate contextual reasoning, generalization, and prompt capabilities at test time.
Capitalizing on the advances in computer vision, medical imaging has also marked a growing interest in these models.
arXiv Detail & Related papers (2023-10-28T12:08:12Z) - Empirical Analysis of a Segmentation Foundation Model in Prostate
Imaging [9.99042549094606]
We consider a recently developed foundation model for medical image segmentation, UniverSeg.
We conduct an empirical evaluation study in the context of prostate imaging and compare it against the conventional approach of training a task-specific segmentation model.
arXiv Detail & Related papers (2023-07-06T20:00:52Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Towards General Purpose Medical AI: Continual Learning Medical
Foundation Model [22.03086588403621]
Inevitable domain and task discrepancies in real-world scenarios can impair the generalization performance of the pre-trained deep models for medical data.
We audaciously propose that we should build a general-purpose medical AI system that can be seamlessly adapted to downstream domains/tasks.
arXiv Detail & Related papers (2023-03-12T05:27:22Z) - Domain Generalization on Medical Imaging Classification using Episodic
Training with Task Augmentation [62.49837463676111]
We propose a novel scheme of episodic training with task augmentation on medical imaging classification.
Motivated by the limited number of source domains in real-world medical deployment, we consider the unique task-level overfitting.
arXiv Detail & Related papers (2021-06-13T03:56:59Z) - GaNDLF: A Generally Nuanced Deep Learning Framework for Scalable
End-to-End Clinical Workflows in Medical Imaging [76.38169390121057]
We present the community-driven Generally Nuanced Deep Learning Framework (GaNDLF)
GaNDLF makes the mechanism of DL development, training, and inference more stable, reproducible, interpretable, and scalable.
We demonstrate the ability of GaNDLF to analyze both radiology and histology images, with built-in support for k-fold cross-validation.
arXiv Detail & Related papers (2021-02-26T02:24:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.