SeeWasm: An Efficient and Fully-Functional Symbolic Execution Engine for WebAssembly Binaries
- URL: http://arxiv.org/abs/2408.08537v1
- Date: Fri, 16 Aug 2024 05:42:30 GMT
- Title: SeeWasm: An Efficient and Fully-Functional Symbolic Execution Engine for WebAssembly Binaries
- Authors: Ningyu He, Zhehao Zhao, Hanqin Guan, Jikai Wang, Shuo Peng, Ding Li, Haoyu Wang, Xiangqun Chen, Yao Guo,
- Abstract summary: WebAssembly (Wasm) can be compiled from more than 40 high-level programming languages.
Vulnerabilities in Wasm binaries could lead to sensitive data leakage and threaten their hosting environments.
To identify them, symbolic execution is widely adopted due to its soundness and the ability to automatically generate exploitations.
- Score: 10.836842574894161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: WebAssembly (Wasm), as a compact, fast, and isolation-guaranteed binary format, can be compiled from more than 40 high-level programming languages. However, vulnerabilities in Wasm binaries could lead to sensitive data leakage and even threaten their hosting environments. To identify them, symbolic execution is widely adopted due to its soundness and the ability to automatically generate exploitations. However, existing symbolic executors for Wasm binaries are typically platform-specific, which means that they cannot support all Wasm features. They may also require significant manual interventions to complete the analysis and suffer from efficiency issues as well. In this paper, we propose an efficient and fully-functional symbolic execution engine, named SeeWasm. Compared with existing tools, we demonstrate that SeeWasm supports full-featured Wasm binaries without further manual intervention, while accelerating the analysis by 2 to 6 times. SeeWasm has been adopted by existing works to identify more than 30 0-day vulnerabilities or security issues in well-known C, Go, and SGX applications after compiling them to Wasm binaries.
Related papers
- Debugging WebAssembly? Put some Whamm on it! [35.842707498062765]
We present Whamm, a declarative instrumentation DSL for WebAssembly.
Whamm solves three problems: 1) tooling fragmentation, 2) prohibitive instrumentation overhead of general-purpose frameworks, and 3) tedium of tailoring low-level high-performance mechanisms.
arXiv Detail & Related papers (2025-04-28T18:52:04Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.
This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - CveBinarySheet: A Comprehensive Pre-built Binaries Database for IoT Vulnerability Analysis [0.0]
CveBinarySheet is a database containing 1033 CVE entries spanning from 1999 to 2024.
Our dataset encompasses 16 essential third-party components, including busybox and curl.
Each precompiled binary is available at two compiler optimization levels (O0 and O3), facilitating comprehensive vulnerability analysis under different compilation scenarios.
arXiv Detail & Related papers (2025-01-15T14:50:46Z) - Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries [2.696054049278301]
We introduce DeBinVul, a novel decompiled binary code vulnerability dataset.
We fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in detecting binary code vulnerabilities.
arXiv Detail & Related papers (2024-11-07T18:54:31Z) - B-Side: Binary-Level Static System Call Identification [1.6050948947693462]
We present B-Side, a static binary analysis tool able to identify a superset of the system calls that an x86-64 static/dynamic executable may invoke at runtime.
B-Side assumes no access to program/libraries sources, and shows a good degree of precision by leveraging symbolic execution.
arXiv Detail & Related papers (2024-10-23T17:26:52Z) - Assemblage: Automatic Binary Dataset Construction for Machine Learning [35.674339346299654]
Assemblage is a cloud-based distributed system that crawls, configures, and builds Windows PE binaries.
We have run Assemblage on AWS over the past year, producing 890k Windows PE and 428k Linux ELF binaries across 29 configurations.
arXiv Detail & Related papers (2024-05-07T04:10:01Z) - How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding.
Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.
We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.
We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - CP-BCS: Binary Code Summarization Guided by Control Flow Graph and
Pseudo Code [79.87518649544405]
We present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS.
CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics.
arXiv Detail & Related papers (2023-10-24T14:20:39Z) - Automatic Bill of Materials [5.14387789987357]
ABOM embeds a hash of each distinct input source code file into the binary emitted by a compiler.
If leveraged across the ecosystem, ABOMs provide a zero-touch, backwards-compatible, drop-in solution for fast supply chain attack detection.
arXiv Detail & Related papers (2023-10-15T05:48:11Z) - BiBench: Benchmarking and Analyzing Network Binarization [72.59760752906757]
Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings.
Common challenges of binarization, such as accuracy degradation and efficiency limitation, suggest that its attributes are not fully understood.
We present BiBench, a rigorously designed benchmark with in-depth analysis for network binarization.
arXiv Detail & Related papers (2023-01-26T17:17:16Z) - BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to
Real-Network Performance [54.214426436283134]
Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications.
We present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance.
We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
arXiv Detail & Related papers (2022-11-13T18:31:45Z) - Detecting Security Fixes in Open-Source Repositories using Static Code
Analyzers [8.716427214870459]
We study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications.
We investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes.
We find that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities.
arXiv Detail & Related papers (2021-05-07T15:57:17Z) - Autosploit: A Fully Automated Framework for Evaluating the
Exploitability of Security Vulnerabilities [47.748732208602355]
Autosploit is an automated framework for evaluating the exploitability of vulnerabilities.
It automatically tests the exploits on different configurations of the environment.
It is able to identify the system properties that affect the ability to exploit a vulnerability in both noiseless and noisy environments.
arXiv Detail & Related papers (2020-06-30T18:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.