Analysis of Machine Learning Approaches to Packing Detection
- URL: http://arxiv.org/abs/2105.00473v1
- Date: Sun, 2 May 2021 13:37:15 GMT
- Title: Analysis of Machine Learning Approaches to Packing Detection
- Authors: Charles-Henry Bertrand Van Ouytsel, Thomas Given-Wilson, Jeremy Minet,
Julian Roussieau, Axel Legay
- Abstract summary: Packing is an obfuscation technique widely used by malware to hide the content and behavior of a program.
No robust results have indicated which algorithms perform best, or which features are most significant.
This work explores eleven different machine learning approaches using 119 features to understand: which features are most significant for packing detection; which algorithms offer the best performance; and which algorithms are most economical.
- Score: 2.4450414803989475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Packing is an obfuscation technique widely used by malware to hide the
content and behavior of a program. Much prior research has explored how to
detect whether a program is packed. This research includes a broad variety of
approaches such as entropy analysis, syntactic signatures and more recently
machine learning classifiers using various features. However, no robust results
have indicated which algorithms perform best, or which features are most
significant. This is complicated by considering how to evaluate the results
since accuracy, cost, generalization capabilities, and other measures are all
reasonable. This work explores eleven different machine learning approaches
using 119 features to understand: which features are most significant for
packing detection; which algorithms offer the best performance; and which
algorithms are most economical.
Related papers
- Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data.
Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box.
In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z) - Exploratory Landscape Analysis for Mixed-Variable Problems [0.7252027234425334]
We provide the means to compute exploratory landscape features for mixed-variable problems where the decision space is a mixture of continuous, binary, integer, and categorical variables.
To further highlight their merit for practical applications, we design and conduct an automated algorithm selection study.
Our trained algorithm selector is able to close the gap between the single best and the virtual best solver by 57.5% over all benchmark problems.
arXiv Detail & Related papers (2024-02-26T10:19:23Z) - Practical considerations for variable screening in the Super Learner [2.9337734440124232]
The Super Learner ensemble has desirable theoretical properties and has been used successfully in many applications.
Dimension reduction can be accomplished by using variable screening algorithms, including the lasso, within the ensemble prior to fitting other prediction algorithms.
We provide empirical results that suggest that a diverse set of candidate screening algorithms should be used to protect against poor performance of any one screen.
arXiv Detail & Related papers (2023-11-06T18:04:39Z) - Relation-aware Ensemble Learning for Knowledge Graph Embedding [68.94900786314666]
We propose to learn an ensemble by leveraging existing methods in a relation-aware manner.
exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods.
We propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently.
arXiv Detail & Related papers (2023-10-13T07:40:12Z) - Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms [88.93372675846123]
We propose a task-agnostic evaluation framework Camilla for evaluating machine learning algorithms.
We use cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills of each sample.
In our experiments, Camilla outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
arXiv Detail & Related papers (2023-07-14T03:15:56Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Algorithmic failure as a humanities methodology: machine learning's
mispredictions identify rich cases for qualitative analysis [0.0]
I trained a simple machine learning algorithm to predict whether or not an action was active or passive using only information about fictional characters.
The results thus support Munk et al.'s theory that failed predictions can be productively used to identify rich cases for qualitative analysis.
Further research is needed to develop an understanding of what kinds of data the method is useful for and which kinds of machine learning are most generative.
arXiv Detail & Related papers (2023-05-19T13:24:32Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Towards Understanding the Behaviors of Optimal Deep Active Learning
Algorithms [19.65665942630067]
Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process.
There is little study on what the optimal AL looks like, which would help researchers understand where their models fall short.
We present a simulated annealing algorithm to search for this optimal oracle and analyze it for several tasks.
arXiv Detail & Related papers (2020-12-29T22:56:42Z) - SWAG: A Wrapper Method for Sparse Learning [0.13854111346209866]
We propose a procedure to find a library of sparse learners with consequent low data collection and storage costs.
This new method delivers a low-dimensional network of attributes that can be easily interpreted.
We call this algorithm "Sparse Wrapper AlGorithm" (SWAG)
arXiv Detail & Related papers (2020-06-23T08:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.