kNN Classification of Malware Data Dependency Graph Features
- URL: http://arxiv.org/abs/2406.02654v2
- Date: Sat, 6 Jul 2024 02:06:49 GMT
- Title: kNN Classification of Malware Data Dependency Graph Features
- Authors: John Musgrave, Anca Ralescu,
- Abstract summary: This study obtains accurate classification from the use of features tied to structure and semantics.
By training an accurate model using labeled data, this feature representation of semantics is shown to be correlated with ground truth labels.
Our results provide evidence that data dependency graphs accurately capture both semantic and structural information for increased explainability in classification results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainability in classification results are dependent upon the features used for classification. Data dependency graph features representing data movement are directly correlated with operational semantics, and subject to fine grained analysis. This study obtains accurate classification from the use of features tied to structure and semantics. By training an accurate model using labeled data, this feature representation of semantics is shown to be correlated with ground truth labels. This was performed using non-parametric learning with a novel feature representation on a large scale dataset, the Kaggle 2015 Malware dataset. The features used enable fine grained analysis, increase in resolution, and explainable inferences. This allows for the body of the term frequency distribution to be further analyzed and to provide an increase in feature resolution over term frequency features. This method obtains high accuracy from analysis of a single instruction, a method that can be repeated for additional instructions to obtain further increases in accuracy. This study evaluates the hypothesis that the semantic representation and analysis of structure are able to make accurate predications and are also correlated to ground truth labels. Additionally, similarity in the metric space can be calculated directly without prior training. Our results provide evidence that data dependency graphs accurately capture both semantic and structural information for increased explainability in classification results.
Related papers
- PAC Learnability under Explanation-Preserving Graph Perturbations [15.83659369727204]
Graph neural networks (GNNs) operate over graphs, enabling the model to leverage the complex relationships and dependencies in graph-structured data.
A graph explanation is a subgraph which is an almost sufficient' statistic of the input graph with respect to its classification label.
This work considers two methods for leveraging such perturbation invariances in the design and training of GNNs.
arXiv Detail & Related papers (2024-02-07T17:23:15Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - Supervised Feature Compression based on Counterfactual Analysis [3.2458225810390284]
This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model.
Using the discretized dataset, an optimal Decision Tree can be trained that resembles the black-box model, but that is interpretable and compact.
arXiv Detail & Related papers (2022-11-17T21:16:14Z) - Metric Distribution to Vector: Constructing Data Representation via
Broad-Scale Discrepancies [15.40538348604094]
We present a novel embedding strategy named $mathbfMetricDistribution2vec$ to extract distribution characteristics into the vectorial representation for each data.
We demonstrate the application and effectiveness of our representation method in the supervised prediction tasks on extensive real-world structural graph datasets.
arXiv Detail & Related papers (2022-10-02T03:18:30Z) - Beyond Separability: Analyzing the Linear Transferability of Contrastive
Representations to Related Subpopulations [50.33975968859988]
Contrastive learning is a highly effective method which uses unlabeled data to produce representations which are linearly separable for downstream classification tasks.
Recent works have shown that contrastive representations are not only useful when data come from a single domain, but are also effective for transferring across domains.
arXiv Detail & Related papers (2022-04-06T09:10:23Z) - Graph-in-Graph (GiG): Learning interpretable latent graphs in
non-Euclidean domain for biological and healthcare applications [52.65389473899139]
Graphs are a powerful tool for representing and analyzing unstructured, non-Euclidean data ubiquitous in the healthcare domain.
Recent works have shown that considering relationships between input data samples have a positive regularizing effect for the downstream task.
We propose Graph-in-Graph (GiG), a neural network architecture for protein classification and brain imaging applications.
arXiv Detail & Related papers (2022-04-01T10:01:37Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Semantic Sentiment Analysis Based on Probabilistic Graphical Models and
Recurrent Neural Network [0.0]
The purpose of this study is to investigate the use of semantics to perform sentiment analysis based on probabilistic graphical models and recurrent neural networks.
The datasets used for the experiments were IMDB movie reviews, Amazon Consumer Product reviews, and Twitter Review datasets.
arXiv Detail & Related papers (2020-08-06T11:59:00Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Geometric graphs from data to aid classification tasks with graph
convolutional networks [0.0]
We show that, even if additional relational information is not available in the data set, one can improve classification by constructing geometric graphs from the features themselves.
The improvement in classification accuracy is maximized by graphs that capture sample similarity with relatively low edge density.
arXiv Detail & Related papers (2020-05-08T15:00:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.