Related papers: Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection

Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection

URL: http://arxiv.org/abs/2310.17304v2
Date: Fri, 19 Apr 2024 08:55:03 GMT
Title: Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection
Authors: Yifan Xia, Ping He, Xuhong Zhang, Peiyu Liu, Shouling Ji, Wenhai Wang,
Abstract summary: WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations. The detection of JavaScript-WebAssembly multilingual malware (JWMM) is challenging due to the complex interoperations and semantic diversity between JavaScript and WebAssembly. We present JWBinder, the first technique aimed at enhancing the static detection of JWMM.
Score: 51.15122099046214
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The emergence of WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations, termed JavaScript-WebAssembly multilingual malware (JWMM). However, existing anti-virus solutions based on static program analysis are still limited to monolingual code. As a result, their detection effectiveness decreases significantly against JWMM. The detection of JWMM is challenging due to the complex interoperations and semantic diversity between JavaScript and WebAssembly. To bridge this gap, we present JWBinder, the first technique aimed at enhancing the static detection of JWMM. JWBinder performs a language-specific data-flow analysis to capture the cross-language interoperations and then characterizes the functionalities of JWMM through a unified high-level structure called Inter-language Program Dependency Graph. The extensive evaluation on one of the most representative real-world anti-virus platforms, VirusTotal, shows that \system effectively enhances anti-virus systems from various vendors and increases the overall successful detection rate against JWMM from 49.1\% to 86.2\%. Additionally, we assess the side effects and runtime overhead of JWBinder, corroborating its practical viability in real-world applications.

Related papers

Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models [49.16690802656554]
We find that Multilingual factual models struggle to provide consistent responses to semantically equivalent prompts in different languages. We propose a linear shortcut method that bypasses computations in the final layers, enhancing both prediction accuracy and cross-lingual consistency.
arXiv Detail & Related papers (2025-04-05T19:43:10Z)
Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task [73.35882908048423]
Retrieval-augmented generation (RAG) has become a cornerstone of contemporary NLP. This paper investigates the effectiveness of RAG across multiple languages by proposing novel approaches for multilingual open-domain question-answering.
arXiv Detail & Related papers (2025-04-04T17:35:43Z)
AlignXIE: Improving Multilingual Information Extraction by Cross-Lingual Alignment [62.69772800910482]
AlignXIE formulates IE across different languages, especially non-English ones, as code generation tasks. It incorporates an IE cross-lingual alignment phase through a translated instance prediction task. It surpasses ChatGPT by $30.17%$ and SoTA by $20.03%$, thereby demonstrating superior cross-lingual IE capabilities.
arXiv Detail & Related papers (2024-11-07T15:36:05Z)
Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping [57.024913536420264]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance on the design-to-code task. We present the first systematic investigation of MLLMs in generating interactive webpages.
arXiv Detail & Related papers (2024-11-05T17:40:03Z)
Fakeium: A Dynamic Execution Environment for JavaScript Program Analysis [3.7980955101286322]
Fakeium is a novel, open source, and lightweight execution environment designed for efficient, large-scale dynamic analysis of JavaScript programs. Fakeium complements traditional static analysis by providing additional API calls and string literals. Fakeium's flexibility and ability to detect hidden API calls, especially in obfuscated sources, highlights its potential as a valuable tool for security analysts to detect malicious behavior.
arXiv Detail & Related papers (2024-10-28T09:27:26Z)
EMMA: Efficient Visual Alignment in Multi-Modal LLMs [56.03417732498859]
EMMA is a lightweight cross-modality module designed to efficiently fuse visual and textual encodings. EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations.
arXiv Detail & Related papers (2024-10-02T23:00:31Z)
$\textit{MMJ-Bench}$: A Comprehensive Study on Jailbreak Attacks and Defenses for Multimodal Large Language Models [11.02754617539271]
We introduce textitMMJ-Bench, a unified pipeline for evaluating jailbreak attacks and defense techniques for MLLMs. We assess the effectiveness of various attack methods against SoTA MLLMs and evaluate the impact of defense mechanisms on both defense effectiveness and model utility.
arXiv Detail & Related papers (2024-08-16T00:18:23Z)
Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls [3.5698678013121334]
This work presents a novel framework leveraging large language models (LLMs) to classify malware based on system call data. Experiments with a dataset of over 1TB of system calls demonstrate that models with larger context sizes, such as BigBird and Longformer, achieve superior accuracy and F1-Score of approximately 0.86. This approach shows significant potential for real-time detection in high-stakes environments, offering a robust solution to evolving cyber threats.
arXiv Detail & Related papers (2024-05-15T13:19:43Z)
Backdoor Attack on Multilingual Machine Translation [53.28390057407576]
multilingual machine translation (MNMT) systems have security vulnerabilities. An attacker injects poisoned data into a low-resource language pair to cause malicious translations in other languages. This type of attack is of particular concern, given the larger attack surface of languages inherent to low-resource settings.
arXiv Detail & Related papers (2024-04-03T01:32:31Z)
AIM: Automated Input Set Minimization for Metamorphic Security Testing [9.232277700524786]
We propose AIM, an approach that automatically selects inputs to reduce testing costs while preserving vulnerability detection capabilities. AIM includes a clustering-based black-box approach, to identify similar inputs based on their security properties. It also relies on a novel genetic algorithm to efficiently select diverse inputs while minimizing their total cost.
arXiv Detail & Related papers (2024-02-16T15:54:58Z)
DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems. It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z)
On the Effectiveness of Adversarial Samples against Ensemble Learning-based Windows PE Malware Detectors [0.0]
We propose a mutation system to counteract ensemble learning-based detectors by combining GANs and an RL model. In the FeaGAN model, ensemble learning is utilized to enhance the malware detector's evasion ability, with the generated adversarial patterns.
arXiv Detail & Related papers (2023-09-25T02:57:27Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.