Related papers: DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations

DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations

URL: http://arxiv.org/abs/2509.11187v1
Date: Sun, 14 Sep 2025 09:32:27 GMT
Title: DMLDroid: Deep Multimodal Fusion Framework for Android Malware Detection with Resilience to Code Obfuscation and Adversarial Perturbations
Authors: Doan Minh Trung, Tien Duc Anh Hao, Luong Hoang Minh, Nghi Hoang Khoa, Nguyen Tan Cam, Van-Hau Pham, Phan The Duy,
Abstract summary: We propose DMLDroid, an Android malware detection based on multimodal fusion.<n>We conduct exhaustive experiments independently on each feature, as well as in combination, using different fusion strategies.<n>Our findings highlight the benefits of multimodal fusion in improving both detection accuracy and robustness against evolving Android malware threats.
Score: 1.3792857294744785
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, learning-based Android malware detection has seen significant advancements, with detectors generally falling into three categories: string-based, image-based, and graph-based approaches. While these methods have shown strong detection performance, they often struggle to sustain robustness in real-world settings, particularly when facing code obfuscation and adversarial examples (AEs). Deep multimodal learning has emerged as a promising solution, leveraging the strengths of multiple feature types to enhance robustness and generalization. However, a systematic investigation of multimodal fusion for both accuracy and resilience remains underexplored. In this study, we propose DMLDroid, an Android malware detection based on multimodal fusion that leverages three different representations of malware features, including permissions & intents (tabular-based), DEX file representations (image-based), and API calls (graph-derived sequence-based). We conduct exhaustive experiments independently on each feature, as well as in combination, using different fusion strategies. Experimental results on the CICMalDroid 2020 dataset demonstrate that our multimodal approach with the dynamic weighted fusion mechanism achieves high performance, reaching 97.98% accuracy and 98.67% F1-score on original malware detection. Notably, the proposed method maintains strong robustness, sustaining over 98% accuracy and 98% F1-score under both obfuscation and adversarial attack scenarios. Our findings highlight the benefits of multimodal fusion in improving both detection accuracy and robustness against evolving Android malware threats.

Related papers

Detecting Deepfakes with Multivariate Soft Blending and CLIP-based Image-Text Alignment [4.34685509565816]
The proliferation of highly realistic facial forgeries necessitates robust detection methods.<n>Existing approaches often suffer from limited accuracy and poor generalization due to significant distribution shifts among samples generated by diverse forgery techniques.<n>Our method leverages the multimodal alignment capabilities of CLIP to capture subtle forgery traces.
arXiv Detail & Related papers (2026-02-14T09:53:35Z)
DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection [1.7024685699333262]
DeepAgent is a framework that simultaneously incorporates both visual and audio modalities for the effective detection of deepfakes.<n>Agent-1 examines each video with a streamlined AlexNet-based CNN to identify the symbols of deepfake manipulation.<n>Agent-2 detects audio-visual inconsistencies by combining acoustic features, audio transcriptions from Whisper, and frame-reading sequences of images through EasyOCR.
arXiv Detail & Related papers (2025-12-08T09:43:30Z)
BIDO: A Unified Approach to Address Obfuscation and Concept Drift Challenges in Image-based Malware Detection [15.388728305777908]
BIDO is a hybrid image-based malware detector designed to enhance robustness against both obfuscation and concept drift simultaneously.<n> Specifically, to improve the discriminative power of image features, we introduce a local feature selection module.<n>Third, to ensure feature compactness, we design a learnable metric that pulls samples with identical labels closer.
arXiv Detail & Related papers (2025-09-04T01:48:03Z)
AgentDroid: A Multi-Agent Framework for Detecting Fraudulent Android Applications [8.108572518706566]
AgentDroid is a novel framework for Android fraudulent application detection based on multi-modal analysis and multi-agent systems.<n>It processes Android applications and extracts a series of multi-modal data for analysis.<n>Our framework achieves an accuracy of 91.7% and an F1-Score of 91.68%, showing improved detection accuracy over the baseline methods.
arXiv Detail & Related papers (2025-03-15T15:07:43Z)
CorrNetDroid: Android Malware Detector leveraging a Correlation-based Feature Selection for Network Traffic features [2.9069289358935073]
This work proposes a dynamic analysis-based Android malware detection system, CorrNetDroid, that works over network traffic flows.<n>Many traffic features exhibit overlapping ranges in normal and malware datasets.<n>Our model effectively reduces the feature set while detecting Android malware with 99.50 percent accuracy when considering only two network traffic features.
arXiv Detail & Related papers (2025-03-03T10:52:34Z)
RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection [61.71770293720491]
We propose a novel two-stage Robust modAlity-imcomplete fusing and Detecting frAmewoRk, abbreviated as RADAR. Our bootstrapping philosophy is to enhance two stages in MIIAD, improving the robustness of the Multimodal Transformer. Our experimental results demonstrate that the proposed RADAR significantly surpasses conventional MIAD methods in terms of effectiveness and robustness.
arXiv Detail & Related papers (2024-10-02T16:47:55Z)
MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware. We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph. This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z)
Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening. It provides a measure of confidence for each modality and elegantly integrates the multi-modality information. Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z)
Deep Learning Fusion For Effective Malware Detection: Leveraging Visual Features [12.431734971186673]
We investigate the power of fusing Convolutional Neural Network models trained on different modalities of a malware executable. We are proposing a novel multimodal fusion algorithm, leveraging three different visual malware features. The proposed strategy has a detection rate of 1.00 (on a scale of 0-1) in identifying malware in the given dataset.
arXiv Detail & Related papers (2024-05-23T08:32:40Z)
Multimodal Industrial Anomaly Detection via Hybrid Fusion [59.16333340582885]
We propose a novel multimodal anomaly detection method with hybrid fusion scheme. Our model outperforms the state-of-the-art (SOTA) methods on both detection and segmentation precision on MVTecD-3 AD dataset.
arXiv Detail & Related papers (2023-03-01T15:48:27Z)
M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information. In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection. We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z)
M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume. We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)
Investigating Vulnerability to Adversarial Examples on Multimodal Data Fusion in Deep Learning [32.125310341415755]
We investigated whether the current multimodal fusion model utilizes the complementary intelligence to defend against adversarial attacks. We verified that the multimodal fusion model optimized for better prediction is still vulnerable to adversarial attack, even if only one of the sensors is attacked.
arXiv Detail & Related papers (2020-05-22T03:45:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.