Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?
- URL: http://arxiv.org/abs/2405.12584v1
- Date: Tue, 21 May 2024 08:27:35 GMT
- Title: Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?
- Authors: Ziqin Lin, Heng Li, Zinan Li, Huazhu Fu, Jiang Liu,
- Abstract summary: An LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework.
To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality.
Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks.
- Score: 33.71784955496207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework. This LFM has shown promising performance in fundus disease diagnosis across multiple datasets. On the other hand, deep learning models have long been challenged by dataset quality issues, such as image quality and dataset bias. To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality. Specifically, we explored the following questions: Is LFM more robust to image quality? Is LFM affected by dataset bias? Can fine-tuning techniques alleviate these effects? Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks. Furthermore, we discovered that overall fine-tuning is an effective adapter for LFM to mitigate the impact of dataset quality issues.
Related papers
- Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Scaling by training on large datasets has been shown to enhance the quality and fidelity of image generation and manipulation with diffusion models.
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.
Our results demonstrate significant performance gains in various scenarios when combined with different fine-tuning schemes.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets [17.01966057343415]
Several factors can impact data quality, such as the presence of duplicates, data leakage across train-test partitions, mislabeled images, and the absence of a well-defined test partition.
We conduct meticulous analyses of three popular dermatological image datasets: DermaMNIST, its source HAM10000, and Fitzpatrick17k.
arXiv Detail & Related papers (2024-01-25T20:29:01Z) - GAN-GA: A Generative Model based on Genetic Algorithm for Medical Image
Generation [0.0]
Generative models offer a promising solution for addressing medical image shortage problems.
This paper proposes the GAN-GA, a generative model optimized by embedding a genetic algorithm.
The proposed model enhances image fidelity and diversity while preserving distinctive features.
arXiv Detail & Related papers (2023-12-30T20:16:45Z) - AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z) - Learn to Ignore: Domain Adaptation for Multi-Site MRI Analysis [1.3079444139643956]
We present a novel method that learns to ignore the scanner-related features present in the images, while learning features relevant for the classification task.
Our method outperforms state-of-the-art domain adaptation methods on a classification task between Multiple Sclerosis patients and healthy subjects.
arXiv Detail & Related papers (2021-10-13T15:40:50Z) - Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z) - Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges.
We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories.
Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.