Data Augmentation for Opcode Sequence Based Malware Detection
- URL: http://arxiv.org/abs/2106.11821v1
- Date: Tue, 22 Jun 2021 14:36:35 GMT
- Title: Data Augmentation for Opcode Sequence Based Malware Detection
- Authors: Niall McLaughlin, Jesus Martinez del Rincon
- Abstract summary: We study different methods of data augmentation starting with basic methods using fixed transformations and moving to methods that adapt to the data.
We propose a novel data augmentation method based on using an opcode embedding layer within the network and its corresponding opcode embedding matrix.
To the best of our knowledge this is the first paper to carry out a systematic study of different augmentation methods applied to opcode sequence based malware classification.
- Score: 2.335152769484957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation has been successfully used in many areas of deep-learning
to significantly improve model performance. Typically data augmentation
simulates realistic variations in data in order to increase the apparent
diversity of the training-set. However, for opcode-based malware analysis,
where deep learning methods are already achieving state of the art performance,
it is not immediately clear how to apply data augmentation. In this paper we
study different methods of data augmentation starting with basic methods using
fixed transformations and moving to methods that adapt to the data. We propose
a novel data augmentation method based on using an opcode embedding layer
within the network and its corresponding opcode embedding matrix to perform
adaptive data augmentation during training. To the best of our knowledge this
is the first paper to carry out a systematic study of different augmentation
methods applied to opcode sequence based malware classification.
Related papers
- Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based
Modulation Recognition [6.793444383222236]
Deep learning for radio modulation recognition has become prevalent in recent years.
In real-world scenarios, it may not be feasible to gather sufficient training data in advance.
Data augmentation is a method used to increase the diversity and quantity of training dataset.
arXiv Detail & Related papers (2023-11-07T06:55:39Z) - Incorporating Supervised Domain Generalization into Data Augmentation [4.14360329494344]
We propose a method, contrastive semantic alignment(CSA) loss, to improve robustness and training efficiency of data augmentation.
Experiments on the CIFAR-100 and CUB datasets show that the proposed method improves the robustness and training efficiency of typical data augmentations.
arXiv Detail & Related papers (2023-10-02T09:20:12Z) - Exploring Representation-Level Augmentation for Code Search [50.94201167562845]
We explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training.
We experimentally evaluate the proposed representation-level augmentation methods with state-of-the-art code search models on a large-scale public dataset.
arXiv Detail & Related papers (2022-10-21T22:47:37Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Data Augmentation Strategies for Improving Sequential Recommender
Systems [7.986899327513767]
Sequential recommender systems have recently achieved significant performance improvements with the exploitation of deep learning (DL) based methods.
We propose a set of data augmentation strategies, all of which transform original item sequences in the way of direct corruption.
Experiments on the latest DL-based model show that applying data augmentation can help the model generalize better.
arXiv Detail & Related papers (2022-03-26T09:58:14Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation [79.47771259100674]
We present two sample-adaptive automatic weighting schemes for data augmentation.
We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive.
On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.
arXiv Detail & Related papers (2021-02-16T17:50:51Z) - Generalization in Reinforcement Learning by Soft Data Augmentation [11.752595047069505]
SOft Data Augmentation (SODA) is a method that decouples augmentation from policy learning.
We find SODA to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.
arXiv Detail & Related papers (2020-11-26T17:00:34Z) - Automatic Data Augmentation via Deep Reinforcement Learning for
Effective Kidney Tumor Segmentation [57.78765460295249]
We develop a novel automatic learning-based data augmentation method for medical image segmentation.
In our method, we innovatively combine the data augmentation module and the subsequent segmentation module in an end-to-end training manner with a consistent loss.
We extensively evaluated our method on CT kidney tumor segmentation which validated the promising results of our method.
arXiv Detail & Related papers (2020-02-22T14:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.