Better Call Graphs: A New Dataset of Function Call Graphs for Malware Classification
- URL: http://arxiv.org/abs/2512.20872v1
- Date: Wed, 24 Dec 2025 01:21:38 GMT
- Title: Better Call Graphs: A New Dataset of Function Call Graphs for Malware Classification
- Authors: Jakir Hossain, Gurvinder Singh, Lukasz Ziarek, Ahmet Erdem Sarıyüce,
- Abstract summary: We introduce Better Call Graphs (BCG), a comprehensive dataset of large and unique Function Call Graphs (FCGs) extracted from recent Android application packages (APKs)<n>BCG includes both benign and malicious samples spanning various families and types, along with graph-level features for each APK.
- Score: 1.201622168415522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Function call graphs (FCGs) have emerged as a powerful abstraction for malware detection, capturing the behavioral structure of applications beyond surface-level signatures. Their utility in traditional program analysis has been well established, enabling effective classification and analysis of malicious software. In the mobile domain, especially in the Android ecosystem, FCG-based malware classification is particularly critical due to the platform's widespread adoption and the complex, component-based structure of Android apps. However, progress in this direction is hindered by the lack of large-scale, high-quality Android-specific FCG datasets. Existing datasets are often outdated, dominated by small or redundant graphs resulting from app repackaging, and fail to reflect the diversity of real-world malware. These limitations lead to overfitting and unreliable evaluation of graph-based classification methods. To address this gap, we introduce Better Call Graphs (BCG), a comprehensive dataset of large and unique FCGs extracted from recent Android application packages (APKs). BCG includes both benign and malicious samples spanning various families and types, along with graph-level features for each APK. Through extensive experiments using baseline classifiers, we demonstrate the necessity and value of BCG compared to existing datasets. BCG is publicly available at https://erdemub.github.io/BCG-dataset.
Related papers
- IoT-based Android Malware Detection Using Graph Neural Network With Adversarial Defense [3.2846283642465077]
We show the effectiveness of graph-based classification using a Graph Neural Network (GNN)-based classifier to generate API graph embeddings.<n>We propose a Generative Adversarial Network (GAN)-based attack algorithm named VGAE-MalGAN targeting graph-based GNN Android malware classifiers.
arXiv Detail & Related papers (2025-12-23T02:57:33Z) - Unlocking Advanced Graph Machine Learning Insights through Knowledge Completion on Neo4j Graph Database [1.1059590443280725]
This paper proposes an innovative architecture that integrates a Knowledge Completion phase into GDB-GML applications.<n>We show how revealing hidden knowledge can heavily impact datasets' behavior and metrics.<n> Experimental results demonstrate that our intuition radically reshapes both topology and overall dataset dynamics.
arXiv Detail & Related papers (2025-11-14T15:27:31Z) - HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware Analysis [28.52072763032641]
We introduce dataset, the largest public hierarchical graph dataset for malware analysis, comprising over textbf200M Control Flow Graphs (CFGs) nested within textbf595K Call Graphs (FCGs)<n>This two-level representation preserves structural semantics essential for building robust detectors resilient to code obfuscation and malware evolution.<n>We demonstrate HiGraph's utility through a large-scale analysis that reveals distinct structural properties of benign and malicious software, establishing it as a foundational benchmark for the community.
arXiv Detail & Related papers (2025-09-02T09:10:52Z) - G-OSR: A Comprehensive Benchmark for Graph Open-Set Recognition [54.45837774534411]
We introduce textbfG-OSR, a benchmark for evaluating Graph Open-Set Recognition (GOSR) methods at both the node and graph levels.<n>Results offer critical insights into the generalizability and limitations of current GOSR methods.
arXiv Detail & Related papers (2025-03-01T13:02:47Z) - Beyond Message Passing: Neural Graph Pattern Machine [50.78679002846741]
We introduce the Neural Graph Pattern Machine (GPM), a novel framework that bypasses message passing by learning directly from graph substructures.<n>GPM efficiently extracts, encodes, and prioritizes task-relevant graph patterns, offering greater expressivity and improved ability to capture long-range dependencies.
arXiv Detail & Related papers (2025-01-30T20:37:47Z) - Cluster Aware Graph Anomaly Detection [32.791460110557104]
We propose a cluster aware multi-view graph anomaly detection method, called CARE.<n>Our approach captures both local and global node affinities by augmenting the graph's adjacency matrix with the pseudo-label.<n>We show that the proposed similarity-guided loss is a variant of contrastive learning loss.
arXiv Detail & Related papers (2024-09-15T15:41:59Z) - Graph Augmentation for Recommendation [30.77695833436189]
Graph augmentation with contrastive learning has gained significant attention in the field of recommendation systems.
We propose a principled framework called GraphAug that generates denoised self-supervised signals, enhancing recommender systems.
The GraphAug framework incorporates a graph information bottleneck (GIB)-regularized augmentation paradigm, which automatically distills informative self-supervision information.
arXiv Detail & Related papers (2024-03-25T11:47:53Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - Model Inversion Attacks against Graph Neural Networks [65.35955643325038]
We study model inversion attacks against Graph Neural Networks (GNNs)
In this paper, we present GraphMI to infer the private training graph data.
Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.
arXiv Detail & Related papers (2022-09-16T09:13:43Z) - Towards Unsupervised Deep Graph Structure Learning [67.58720734177325]
We propose an unsupervised graph structure learning paradigm, where the learned graph topology is optimized by data itself without any external guidance.
Specifically, we generate a learning target from the original data as an "anchor graph", and use a contrastive loss to maximize the agreement between the anchor graph and the learned graph.
arXiv Detail & Related papers (2022-01-17T11:57:29Z) - Inverse Graph Identification: Can We Identify Node Labels Given Graph
Labels? [89.13567439679709]
Graph Identification (GI) has long been researched in graph learning and is essential in certain applications.
This paper defines a novel problem dubbed Inverse Graph Identification (IGI)
We propose a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI.
arXiv Detail & Related papers (2020-07-12T12:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.