Is More Data All You Need? A Causal Exploration
- URL: http://arxiv.org/abs/2206.02409v1
- Date: Mon, 6 Jun 2022 08:02:54 GMT
- Title: Is More Data All You Need? A Causal Exploration
- Authors: Athanasios Vlontzos, Hadrien Reynaud, Bernhard Kainz
- Abstract summary: Causal analysis is often used in medicine and economics to gain insights about the effects of actions and policies.
In this paper we explore the effect of dataset interventions on the output of image classification models.
- Score: 4.756600446882457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Curating a large scale medical imaging dataset for machine learning
applications is both time consuming and expensive. Balancing the workload
between model development, data collection and annotations is difficult for
machine learning practitioners, especially under time constraints. Causal
analysis is often used in medicine and economics to gain insights about the
effects of actions and policies. In this paper we explore the effect of dataset
interventions on the output of image classification models. Through a causal
approach we investigate the effects of the quantity and type of data we need to
incorporate in a dataset to achieve better performance for specific subtasks.
The main goal of this paper is to highlight the potential of causal analysis as
a tool for resource optimization for developing medical imaging ML
applications. We explore this concept with a synthetic dataset and an exemplary
use-case for Diabetic Retinopathy image analysis.
Related papers
- Deep Learning with HM-VGG: AI Strategies for Multi-modal Image Analysis [10.01246918773756]
This study introduces the Hybrid Multi-modal VGG model, a cutting-edge deep learning approach for the early diagnosis of glaucoma.
The model's performance is underscored by its high metrics in Precision, Accuracy, and F1-Score.
The HM-VGG model offers a promising tool for doctors, streamlining the diagnostic process and improving patient outcomes.
arXiv Detail & Related papers (2024-10-31T15:42:24Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Less is more: Ensemble Learning for Retinal Disease Recognition Under
Limited Resources [12.119196313470887]
This paper introduces a novel ensemble learning mechanism designed for recognizing retinal diseases under limited resources.
The mechanism leverages insights from multiple pre-trained models, facilitating the transfer and adaptation of their knowledge to Retinal OCT images.
arXiv Detail & Related papers (2024-02-15T06:58:25Z) - Medical Report Generation based on Segment-Enhanced Contrastive
Representation Learning [39.17345313432545]
We propose MSCL (Medical image with Contrastive Learning) to segment organs, abnormalities, bones, etc.
We introduce a supervised contrastive loss that assigns more weight to reports that are semantically similar to the target while training.
Experimental results demonstrate the effectiveness of our proposed model, where we achieve state-of-the-art performance on the IU X-Ray public dataset.
arXiv Detail & Related papers (2023-12-26T03:33:48Z) - Few Shot Learning for Medical Imaging: A Comparative Analysis of
Methodologies and Formal Mathematical Framework [0.0]
scarcity of problem-dependent training data has become a larger issue in the way of easy application of deep learning in the medical sector.
Few hot learning algorithms determine to solve the data limitation problems by extracting the characteristics from a small dataset.
In the medical sector, there is frequently a shortage of available datasets in respect of some confidential diseases.
arXiv Detail & Related papers (2023-05-08T01:05:22Z) - Understanding the Tricks of Deep Learning in Medical Image Segmentation:
Challenges and Future Directions [66.40971096248946]
In this paper, we collect a series of MedISeg tricks for different model implementation phases.
We experimentally explore the effectiveness of these tricks on consistent baselines.
We also open-sourced a strong MedISeg repository, where each component has the advantage of plug-and-play.
arXiv Detail & Related papers (2022-09-21T12:30:05Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Suggestive Annotation of Brain Tumour Images with Gradient-guided
Sampling [14.092503407739422]
We propose an efficient annotation framework for brain tumour images that is able to suggest informative sample images for human experts to annotate.
Experiments show that training a segmentation model with only 19% suggestively annotated patient scans from BraTS 2019 dataset can achieve a comparable performance to training a model on the full dataset for whole tumour segmentation task.
arXiv Detail & Related papers (2020-06-26T13:39:49Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.