Related papers: VET-DINO: Learning Anatomical Understanding Through Multi-View Distillation in Veterinary Imaging

VET-DINO: Learning Anatomical Understanding Through Multi-View Distillation in Veterinary Imaging

URL: http://arxiv.org/abs/2505.15248v1
Date: Wed, 21 May 2025 08:23:48 GMT
Title: VET-DINO: Learning Anatomical Understanding Through Multi-View Distillation in Veterinary Imaging
Authors: Andre Dourson, Kylie Taylor, Xiaoli Qiao, Michael Fitzke,
Abstract summary: VET-DINO is a framework that leverages the availability of multiple standardized views from the same study.<n>We demonstrate our approach on a dataset of 5 million veterinary radiographs from 668,000 canine studies.<n>We show that learning from real multi-view pairs leads to superior anatomical understanding compared to purely synthetic augmentations.
Score: 0.17999333451993946
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Self-supervised learning has emerged as a powerful paradigm for training deep neural networks, particularly in medical imaging where labeled data is scarce. While current approaches typically rely on synthetic augmentations of single images, we propose VET-DINO, a framework that leverages a unique characteristic of medical imaging: the availability of multiple standardized views from the same study. Using a series of clinical veterinary radiographs from the same patient study, we enable models to learn view-invariant anatomical structures and develop an implied 3D understanding from 2D projections. We demonstrate our approach on a dataset of 5 million veterinary radiographs from 668,000 canine studies. Through extensive experimentation, including view synthesis and downstream task performance, we show that learning from real multi-view pairs leads to superior anatomical understanding compared to purely synthetic augmentations. VET-DINO achieves state-of-the-art performance on various veterinary imaging tasks. Our work establishes a new paradigm for self-supervised learning in medical imaging that leverages domain-specific properties rather than merely adapting natural image techniques.

Related papers

Does DINOv3 Set a New Medical Vision Standard? [67.33543059306938]
This report investigates whether DINOv3 can serve as a powerful unified encoder for medical vision tasks without domain-specific pre-training.<n>We benchmark DINOv3 across common medical vision tasks, including 2D/3D classification and segmentation.<n>Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks.
arXiv Detail & Related papers (2025-09-08T09:28:57Z)
Xeno-learning: knowledge transfer across species in deep learning-based spectral image analysis [27.30616253053021]
"xeno-learning" is a cross-species knowledge transfer paradigm inspired by xeno-transplantation.<n>We show that although spectral signatures of organs differ across species, relative changes resulting from pathologies or surgical manipulation are comparable.<n>The resulting ethical, monetary, and performance benefits promise a high impact of the proposed knowledge transfer paradigm.
arXiv Detail & Related papers (2024-10-15T16:25:16Z)
Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework. Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z)
Unified Medical Image Pre-training in Language-Guided Common Semantic Space [39.61770813855078]
We propose an Unified Medical Image Pre-training framework, namely UniMedI. UniMedI uses diagnostic reports as common semantic space to create unified representations for diverse modalities of medical images. We evaluate its performance on both 2D and 3D images across 10 different datasets.
arXiv Detail & Related papers (2023-11-24T22:01:12Z)
BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys [99.7082441544384]
We present BiomedJourney, a novel method for counterfactual biomedical image generation by instruction-learning. We use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression. The resulting triples are then used to train a latent diffusion model for counterfactual biomedical image generation.
arXiv Detail & Related papers (2023-10-16T18:59:31Z)
Towards a Visual-Language Foundation Model for Computational Pathology [5.72536252929528]
We introduce CONtrastive learning from Captions for Histopathology (CONCH) CONCH is a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and task-agnostic pretraining. It is evaluated on a suite of 13 diverse benchmarks, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval.
arXiv Detail & Related papers (2023-07-24T16:13:43Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity [15.404013190033242]
The application of. Efficient Fine-Tuning (PEFT) in medical image analysis is relatively unexplored. This study fills this gap by evaluating 17 distinct PEFT algorithms on image classification and text-to-image generation tasks. Our findings demonstrate PEFT's effectiveness, particularly in low data regimes common in medical imaging.
arXiv Detail & Related papers (2023-05-14T21:18:18Z)
Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [8.547751745702156]
We show that well-designed medical prompts are the key to elicit knowledge from pre-trained vision language models (VLM) We develop three approaches for automatic generation of medical prompts, which can inject expert-level medical knowledge and image-specific information into the prompts for fine-grained grounding.
arXiv Detail & Related papers (2022-09-30T15:06:13Z)
Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels [54.58539616385138]
We introduce a novel semi-supervised 2D medical image segmentation framework termed Mine yOur owN Anatomy (MONA) First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features.
arXiv Detail & Related papers (2022-09-27T15:50:31Z)
A Multi-Stage Attentive Transfer Learning Framework for Improving COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis. Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains. Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z)
Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time. We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.