cozy: Comparative Symbolic Execution for Binary Programs
- URL: http://arxiv.org/abs/2504.00151v1
- Date: Mon, 31 Mar 2025 18:59:30 GMT
- Title: cozy: Comparative Symbolic Execution for Binary Programs
- Authors: Caleb Helbling, Graham Leach-Krouse, Sam Lasser, Greg Sullivan,
- Abstract summary: cozy is a tool for analyzing and visualizing differences between two versions of a software binary.<n> cozy comes with a web-based visual interface for viewing comparison results.
- Score: 0.6999740786886538
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces cozy, a tool for analyzing and visualizing differences between two versions of a software binary. The primary use case for cozy is validating "micropatches": small binary or assembly-level patches inserted into existing compiled binaries. To perform this task, cozy leverages the Python-based angr symbolic execution framework. Our tool analyzes the output of symbolic execution to find end states for the pre- and post-patched binaries that are compatible (reachable from the same input). The tool then compares compatible states for observable differences in registers, memory, and side effects. To aid in usability, cozy comes with a web-based visual interface for viewing comparison results. This interface provides a rich set of operations for pruning, filtering, and exploring different types of program data.
Related papers
- ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - StrTune: Data Dependence-based Code Slicing for Binary Similarity Detection with Fine-tuned Representation [5.41477941455399]
BCSD can address binary tasks such as malicious code snippets identification and binary patch analysis by comparing code patterns.
Because binaries are compiled with different compilation configurations, existing approaches still face notable limitations when comparing binary similarity.
We propose StrTune, which slices binary code based on data dependence and perform slice-level fine-tuning.
arXiv Detail & Related papers (2024-11-19T12:20:08Z) - Image2Struct: Benchmarking Structure Extraction for Vision-Language Models [57.531922659664296]
Image2Struct is a benchmark to evaluate vision-pixel models (VLMs) on extracting structure from images.
In Image2Struct, VLMs are prompted to generate the underlying structure from an input image.
The structure is then rendered to produce an output image, which is compared against the input image to produce a similarity score.
arXiv Detail & Related papers (2024-10-29T18:44:59Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Parallel Program Analysis on Path Ranges [3.018638214344819]
Ranged symbolic execution performs symbolic execution on program parts, so called path ranges, in parallel.
We present a verification approach that splits programs into path ranges and then runs arbitrary analyses on the ranges in parallel.
arXiv Detail & Related papers (2024-02-19T08:26:52Z) - BinGo: Identifying Security Patches in Binary Code with Graph
Representation Learning [19.22004583230725]
We propose BinGo, a new security patch detection system for binary code.
BinGo consists of four phases, namely, patch data pre-processing, graph extraction, embedding generation, and graph representation learning.
Our experimental results show BinGo can achieve up to 80.77% accuracy in identifying security patches between two neighboring versions of binary code.
arXiv Detail & Related papers (2023-12-13T06:35:39Z) - VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity [36.341893383865745]
VexIR2Vec is an approach for binary similarity using VEX-IR, an architecture-neutral Intermediate Representation (IR)
We learn the vocabulary of representations at the entity level of the IR using the knowledge graph embedding techniques in an unsupervised manner.
VexIR2Vec is $3.1$-$3.5 times$ faster than the closest baselines and orders-of-magnitude faster than other tools.
arXiv Detail & Related papers (2023-12-01T11:22:10Z) - BiBench: Benchmarking and Analyzing Network Binarization [72.59760752906757]
Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings.
Common challenges of binarization, such as accuracy degradation and efficiency limitation, suggest that its attributes are not fully understood.
We present BiBench, a rigorously designed benchmark with in-depth analysis for network binarization.
arXiv Detail & Related papers (2023-01-26T17:17:16Z) - Learning Tracking Representations via Dual-Branch Fully Transformer
Networks [82.21771581817937]
We present a Siamese-like Dual-branch network based on solely Transformers for tracking.
We extract a feature vector for each patch based on its matching results with others within an attention window.
The method achieves better or comparable results as the best-performing methods.
arXiv Detail & Related papers (2021-12-05T13:44:33Z) - Semantic-aware Binary Code Representation with BERT [27.908093567605484]
A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code.
Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary.
In this paper, we propose DeepSemantic utilizing BERT in producing the semantic-aware code representation of a binary code.
arXiv Detail & Related papers (2021-06-10T03:31:29Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.