Related papers: A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning

A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning

URL: http://arxiv.org/abs/2407.18667v1
Date: Fri, 26 Jul 2024 11:03:18 GMT
Title: A Labeled Ophthalmic Ultrasound Dataset with Medical Report Generation Based on Cross-modal Deep Learning
Authors: Jing Wang, Junyan Fan, Meng Zhou, Yanzhu Zhang, Mingyu Shi,
Abstract summary: We present a labeled ophthalmic dataset for the precise analysis and the automated exploration of medical images along with their associated reports. It collects three modal data, including the ultrasound images, blood flow information and examination reports from 2,417 patients at an ophthalmology hospital in Shenyang, China.
Score: 8.733721267033705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ultrasound imaging reveals eye morphology and aids in diagnosing and treating eye diseases. However, interpreting diagnostic reports requires specialized physicians. We present a labeled ophthalmic dataset for the precise analysis and the automated exploration of medical images along with their associated reports. It collects three modal data, including the ultrasound images, blood flow information and examination reports from 2,417 patients at an ophthalmology hospital in Shenyang, China, during the year 2018, in which the patient information is de-identified for privacy protection. To the best of our knowledge, it is the only ophthalmic dataset that contains the three modal information simultaneously. It incrementally consists of 4,858 images with the corresponding free-text reports, which describe 15 typical imaging findings of intraocular diseases and the corresponding anatomical locations. Each image shows three kinds of blood flow indices at three specific arteries, i.e., nine parameter values to describe the spectral characteristics of blood flow distribution. The reports were written by ophthalmologists during the clinical care. The proposed dataset is applied to generate medical report based on the cross-modal deep learning model. The experimental results demonstrate that our dataset is suitable for training supervised models concerning cross-modal medical data.

Related papers

On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI [4.866086225040713]
We introduce a perturbation-based approach to quantify a model's reliance on each modality in binary classification tasks.<n>By swapping images or text between samples with opposing labels, we expose modality-specific biases.
arXiv Detail & Related papers (2025-07-31T21:35:52Z)
Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models [9.76070837929117]
Existing alignment methods prioritize separation between disease classes over segregation of fine-grained pathology attributes. Here, we propose MedTrim, a novel method that enhances image-text alignment through multimodal triplet learning. Our demonstrations indicate that MedTrim improves performance in downstream retrieval and classification tasks compared to state-of-the-art alignment methods.
arXiv Detail & Related papers (2025-04-22T14:17:51Z)
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model [51.66031028717933]
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare. Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model. We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
arXiv Detail & Related papers (2025-04-18T12:09:15Z)
Integrating Medical Imaging and Clinical Reports Using Multimodal Deep Learning for Advanced Disease Analysis [3.8758525789991896]
An innovative multi-modal deep learning model is proposed to deeply integrate heterogeneous information from medical images and clinical reports. For medical images, convolutional neural networks were used to extract high-dimensional features and capture key visual information. For clinical report text, a two-way long and short-term memory network combined with an attention mechanism is used for deep semantic understanding.
arXiv Detail & Related papers (2024-05-23T02:22:10Z)
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training [1.6567372257085946]
Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports. We propose DeViDe, a transformer-based method that leverages radiographic descriptions from the open web. DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources. In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets.
arXiv Detail & Related papers (2024-04-04T17:40:06Z)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z)
Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z)
OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics [11.343658407664918]
We introduce the Ophthalmic Labels for Investigating Visual Eye Semantics (OLIVES) dataset. This is the first OCT and near-IR fundus dataset that includes clinical labels, biomarker labels, disease labels, and time-series patient treatment information. There are 96 eyes' data averaged over a period of at least two years with each eye treated for an average of 66 weeks and 7 injections.
arXiv Detail & Related papers (2022-09-22T17:36:40Z)
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG) CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure. Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z)
A Benchmark for Studying Diabetic Retinopathy: Segmentation, Grading, and Transferability [76.64661091980531]
People with diabetes are at risk of developing diabetic retinopathy (DR) Computer-aided DR diagnosis is a promising tool for early detection of DR and severity grading. This dataset has 1,842 images with pixel-level DR-related lesion annotations, and 1,000 images with image-level labels graded by six board-certified ophthalmologists.
arXiv Detail & Related papers (2020-08-22T07:48:04Z)
A Patient-Centric Dataset of Images and Metadata for Identifying Melanomas Using Clinical Context [39.10946113351587]
The 2020 SIIM-ISIC Melanoma Classification challenge dataset was constructed to address the discrepancy between prior challenges and clinical practice. The dataset represents 2,056 patients from three continents with an average of 16 lesions per patient.
arXiv Detail & Related papers (2020-08-07T20:22:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.