Related papers: Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models

Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models

URL: http://arxiv.org/abs/2204.01729v1
Date: Mon, 4 Apr 2022 09:38:38 GMT
Title: Analyzing the Effects of Handling Data Imbalance on Learned Features from Medical Images by Looking Into the Models
Authors: Ashkan Khakzar, Yawei Li, Yang Zhang, Mirac Sanisoglu, Seong Tae Kim, Mina Rezaei, Bernd Bischl, Nassir Navab
Abstract summary: Training a model on an imbalanced dataset can introduce unique challenges to the learning problem. We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
Score: 50.537859423741644
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One challenging property lurking in medical datasets is the imbalanced data distribution, where the frequency of the samples between the different classes is not balanced. Training a model on an imbalanced dataset can introduce unique challenges to the learning problem where a model is biased towards the highly frequent class. Many methods are proposed to tackle the distributional differences and the imbalanced problem. However, the impact of these approaches on the learned features is not well studied. In this paper, we look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features. We study several popular cost-sensitive approaches for handling data imbalance and analyze the feature maps of the convolutional neural networks from multiple perspectives: analyzing the alignment of salient features with pathologies and analyzing the pathology-related concepts encoded by the networks. Our study reveals differences and insights regarding the trained models that are not reflected by quantitative metrics such as AUROC and AP and show up only by looking at the models through a lens.

Related papers

An analysis of data variation and bias in image-based dermatological datasets for machine learning classification [2.039829968340841]
In clinical dermatology, classification models can detect malignant lesions on patients' skin using only RGB images as input. Most learning-based methods employ data acquired from dermoscopic datasets on training, which are large and validated by a gold standard. This work aims to evaluate the gap between dermoscopic and clinical samples and understand how the dataset variations impact training.
arXiv Detail & Related papers (2025-01-15T17:18:46Z)
A Critical Assessment of Interpretable and Explainable Machine Learning for Intrusion Detection [0.0]
We study the use of overly complex and opaque ML models, unaccounted data imbalances and correlated features, inconsistent influential features across different explanation methods, and the implausible utility of explanations. Specifically, we advise avoiding complex opaque models such as Deep Neural Networks and instead using interpretable ML models such as Decision Trees. We find that feature-based model explanations are most often inconsistent across different settings.
arXiv Detail & Related papers (2024-07-04T15:35:42Z)
Few-shot learning for COVID-19 Chest X-Ray Classification with Imbalanced Data: An Inter vs. Intra Domain Study [49.5374512525016]
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research. Some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images. We propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance.
arXiv Detail & Related papers (2024-01-18T16:59:27Z)
The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems [4.370097023410272]
Imbalanced data poses a challenge in classification as model performance is affected by insufficient learning from minority classes. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intelligence tools are used to compare models trained on datasets before and after balancing.
arXiv Detail & Related papers (2023-06-30T22:25:01Z)
Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data. We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z)
Towards Understanding How Data Augmentation Works with Imbalanced Data [17.478900028887537]
We study the effect of data augmentation on three different classifiers, convolutional neural networks, support vector machines, and logistic regression models. Our research indicates that DA, when applied to imbalanced data, produces substantial changes in model weights, support vectors and feature selection. We hypothesize that DA works by facilitating variances in data, so that machine learning models can associate changes in the data with labels.
arXiv Detail & Related papers (2023-04-12T15:01:22Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting. We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.