Related papers: iMedImage Technical Report

iMedImage Technical Report

URL: http://arxiv.org/abs/2503.21836v1
Date: Thu, 27 Mar 2025 03:25:28 GMT
Title: iMedImage Technical Report
Authors: Ran Wei, ZhiXiong Lan, Qing Yan, Ning Song, Ming Lv, LongQing Ye,
Abstract summary: Chromosome karyotype analysis is crucial for diagnosing hereditary diseases, yet detecting structural abnormalities remains challenging.<n>We developed iMedImage, an end-to-end model for general medical image recognition, demonstrating strong performance across multiple imaging tasks.
Score: 5.0953390013898705
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Background: Chromosome karyotype analysis is crucial for diagnosing hereditary diseases, yet detecting structural abnormalities remains challenging. While AI has shown promise in medical imaging, its effectiveness varies across modalities. Leveraging advances in Foundation Models that integrate multimodal medical imaging for robust feature extraction and accurate diagnosis, we developed iMedImage, an end-to-end model for general medical image recognition, demonstrating strong performance across multiple imaging tasks, including chromosome abnormality detection. Materials and Methods: We constructed a comprehensive medical image dataset encompassing multiple modalities from common medical domains, including chromosome, cell, pathology, ultrasound, X-ray, CT, and MRI images. Based on this dataset, we developed the iMedImage model, which incorporates the following key features: (1) a unified representation method for diverse modality inputs and medical imaging tasks; (2) multi-level (case-level, image-level, patch-level) image recognition capabilities enhanced by Chain of Thought (CoT) embedding and Mixture of Experts (MoE) strategies. Results: The test set comprised data from 12 institutions across six regions in China, covering three mainstream scanning devices, and included naturally distributed, unscreened abnormal cases. On this diverse dataset, the model achieved a fully automated chromosome analysis workflow, including segmentation, karyotyping, and abnormality detection, reaching a sensitivity of 92.75% and a specificity of 91.5%. Conclusion: We propose iMedImage, an end-to-end foundation model for medical image analysis, demonstrating its superior performance across various medical imaging tasks. iMedImage provides clinicians with a precise imaging analysis tool and contributes to improving diagnostic accuracy and disease screening.

Related papers

InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion [0.0]
Age-related macular degeneration, glaucoma, diabetic retinopathy (DR), diabetic macular edema, and pathological myopia affect hundreds of millions of people worldwide.<n>We develop InSight, an AI-based app that combines patient metadata with fundus images for accurate diagnosis of five common eye diseases.
arXiv Detail & Related papers (2025-07-16T23:00:10Z)
RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [48.21287619304126]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z)
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding [17.783231335173486]
We propose a fine-grained vision-language model (fVLM) for anatomy-level CT image interpretation.<n>Fine-grained alignment, however, faces considerable false-negative challenges.<n>We curated the largest CT dataset to date, comprising imaging and report data from 69,086 patients.
arXiv Detail & Related papers (2025-01-24T14:50:48Z)
MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis [28.421857904824627]
MiniGPT-Med is a vision-language model derived from large-scale language models and tailored for medical applications. It is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. It achieves state-of-the-art performance on medical report generation, higher than the previous best model by 19% accuracy.
arXiv Detail & Related papers (2024-07-04T18:21:10Z)
Lightening Anything in Medical Images [23.366303785451684]
We introduce a pioneering training-free Diffusion Model for Universal Medical Image Enhancement, named UniMIE. UniMIE demonstrates its unsupervised enhancement capabilities across various medical image modalities without the need for any fine-tuning. We conduct a comprehensive evaluation on 13 imaging modalities and over 15 medical types, demonstrating better qualities, robustness, and accuracy than other modality-specific and data-inefficient models.
arXiv Detail & Related papers (2024-06-01T05:07:50Z)
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements.<n>This approach aims to leverage the complementary information present in these modalities to enhance the accuracy of various medical applications.
arXiv Detail & Related papers (2024-03-20T05:50:04Z)
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge [93.61262892578067]
Uncertainty in medical image segmentation tasks, especially inter-rater variability, presents a significant challenge. This variability directly impacts the development and evaluation of automated segmentation algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ)
arXiv Detail & Related papers (2024-03-19T17:57:24Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies. Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner. The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z)
Segment Anything in Medical Images [21.43661408153244]
We present MedSAM, a foundation model designed for enabling universal medical image segmentation. The model is developed on a large-scale medical image dataset with 1,570,263 image-mask pairs, covering 10 imaging modalities and over 30 cancer types.
arXiv Detail & Related papers (2023-04-24T17:56:12Z)
Convolutional-LSTM for Multi-Image to Single Output Medical Prediction [55.41644538483948]
A common scenario in developing countries is to have the volume metadata lost due multiple reasons. It is possible to get a multi-image to single diagnostic model which mimics human doctor diagnostic process.
arXiv Detail & Related papers (2020-10-20T04:30:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.