Comparative Analysis of Imbalanced Malware Byteplot Image Classification
using Transfer Learning
- URL: http://arxiv.org/abs/2310.02742v1
- Date: Wed, 4 Oct 2023 11:33:36 GMT
- Title: Comparative Analysis of Imbalanced Malware Byteplot Image Classification
using Transfer Learning
- Authors: Jayasudha M, Ayesha Shaik, Gaurav Pendharkar, Soham Kumar, Muhesh
Kumar B, Sudharshanan Balaji
- Abstract summary: Malware detectors help cyber-attacks by comparing malware signatures.
In this paper, the performance of six multiclass classification models is compared.
It is observed that the more the class imbalance less the number of epochs required for convergence.
- Score: 0.873811641236639
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cybersecurity is a major concern due to the increasing reliance on technology
and interconnected systems. Malware detectors help mitigate cyber-attacks by
comparing malware signatures. Machine learning can improve these detectors by
automating feature extraction, identifying patterns, and enhancing dynamic
analysis. In this paper, the performance of six multiclass classification
models is compared on the Malimg dataset, Blended dataset, and Malevis dataset
to gain insights into the effect of class imbalance on model performance and
convergence. It is observed that the more the class imbalance less the number
of epochs required for convergence and a high variance across the performance
of different models. Moreover, it is also observed that for malware detectors
ResNet50, EfficientNetB0, and DenseNet169 can handle imbalanced and balanced
data well. A maximum precision of 97% is obtained for the imbalanced dataset, a
maximum precision of 95% is obtained on the intermediate imbalance dataset, and
a maximum precision of 95% is obtained for the perfectly balanced dataset.
Related papers
- Electroencephalogram Emotion Recognition via AUC Maximization [0.0]
Imbalanced datasets pose significant challenges in areas including neuroscience, cognitive science, and medical diagnostics.
This study addresses the issue class imbalance, using the Liking' label in the DEAP dataset as an example.
arXiv Detail & Related papers (2024-08-16T19:08:27Z) - Efficient Network Traffic Feature Sets for IoT Intrusion Detection [0.0]
This work evaluates the feature sets provided by a combination of different feature selection methods, namely Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, in multiple IoT network datasets.
The influence of the smaller feature sets on both the classification performance and the training time of ML models is compared, with the aim of increasing the computational efficiency of IoT intrusion detection.
arXiv Detail & Related papers (2024-06-12T09:51:29Z) - PrivFED -- A Framework for Privacy-Preserving Federated Learning in Enhanced Breast Cancer Diagnosis [0.0]
This study introduces a federated learning framework, trained on the Wisconsin dataset, to mitigate challenges such as data scarcity and imbalance.
The model exhibits an average accuracy of 99.95% on edge devices and 98% on the central server.
arXiv Detail & Related papers (2024-05-13T18:01:57Z) - Class Imbalance in Object Detection: An Experimental Diagnosis and Study
of Mitigation Strategies [0.5439020425818999]
This study introduces a benchmarking framework utilizing the YOLOv5 single-stage detector to address the problem of foreground-foreground class imbalance.
We scrutinized three established techniques: sampling, loss weighing, and data augmentation.
Our comparative analysis reveals that sampling and loss reweighing methods, while shown to be beneficial in two-stage detector settings, do not translate as effectively in improving YOLOv5's performance.
arXiv Detail & Related papers (2024-03-11T19:06:04Z) - CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? [72.19502317793133]
We study the effectiveness of data-balancing for mitigating biases in contrastive language-image pretraining (CLIP)
We present a novel algorithm, called Multi-Modal Moment Matching (M4), designed to reduce both representation and association biases.
arXiv Detail & Related papers (2024-03-07T14:43:17Z) - Machine learning-based network intrusion detection for big and
imbalanced data using oversampling, stacking feature embedding and feature
extraction [6.374540518226326]
Intrusion Detection Systems (IDS) play a critical role in protecting interconnected networks by detecting malicious actors and activities.
This paper introduces a novel ML-based network intrusion detection model that uses Random Oversampling (RO) to address data imbalance and Stacking Feature Embedding (PCA) for dimension reduction.
Using the CIC-IDS 2017 dataset, DT, RF, and ET models reach 99.99% accuracy, while DT and RF models obtain 99.94% accuracy on CIC-IDS 2018 dataset.
arXiv Detail & Related papers (2024-01-22T05:49:41Z) - Few-shot learning for COVID-19 Chest X-Ray Classification with
Imbalanced Data: An Inter vs. Intra Domain Study [49.5374512525016]
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research.
Some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images.
We propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance.
arXiv Detail & Related papers (2024-01-18T16:59:27Z) - Analyzing the Effects of Handling Data Imbalance on Learned Features
from Medical Images by Looking Into the Models [50.537859423741644]
Training a model on an imbalanced dataset can introduce unique challenges to the learning problem.
We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
arXiv Detail & Related papers (2022-04-04T09:38:38Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.