Related papers: Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?

Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?

URL: http://arxiv.org/abs/2405.12584v1
Date: Tue, 21 May 2024 08:27:35 GMT
Title: Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?
Authors: Ziqin Lin, Heng Li, Zinan Li, Huazhu Fu, Jiang Liu,
Abstract summary: An LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework. To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality. Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks.
Score: 33.71784955496207
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework. This LFM has shown promising performance in fundus disease diagnosis across multiple datasets. On the other hand, deep learning models have long been challenged by dataset quality issues, such as image quality and dataset bias. To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality. Specifically, we explored the following questions: Is LFM more robust to image quality? Is LFM affected by dataset bias? Can fine-tuning techniques alleviate these effects? Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks. Furthermore, we discovered that overall fine-tuning is an effective adapter for LFM to mitigate the impact of dataset quality issues.

Related papers

Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology [41.34847597178388]
Vision foundation models (FMs) learn to represent histological features in highly heterogeneous tiles extracted from whole-slide images. We investigate the potential of unsupervised automatic data curation at the tile-level, taking into account 350 million tiles.
arXiv Detail & Related papers (2025-03-24T14:23:48Z)
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation. We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z)
Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases. We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases. We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models. Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z)
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets [19.128392861461297]
We conduct meticulous analyses of two popular dermatological image datasets: DermaMNIST and Fitzpatrick17k. We uncover data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets.
arXiv Detail & Related papers (2024-01-25T20:29:01Z)
GAN-GA: A Generative Model based on Genetic Algorithm for Medical Image Generation [0.0]
Generative models offer a promising solution for addressing medical image shortage problems. This paper proposes the GAN-GA, a generative model optimized by embedding a genetic algorithm. The proposed model enhances image fidelity and diversity while preserving distinctive features.
arXiv Detail & Related papers (2023-12-30T20:16:45Z)
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [50.21065317817769]
We propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules. Experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets.
arXiv Detail & Related papers (2022-03-18T13:43:53Z)
Learn to Ignore: Domain Adaptation for Multi-Site MRI Analysis [1.3079444139643956]
We present a novel method that learns to ignore the scanner-related features present in the images, while learning features relevant for the classification task. Our method outperforms state-of-the-art domain adaptation methods on a classification task between Multiple Sclerosis patients and healthy subjects.
arXiv Detail & Related papers (2021-10-13T15:40:50Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z)
Deep Mining External Imperfect Data for Chest X-ray Disease Screening [57.40329813850719]
We argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges. We formulate the multi-label disease classification problem as weighted independent binary tasks according to the categories. Our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability.
arXiv Detail & Related papers (2020-06-06T06:48:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.