Related papers: ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning

ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning

URL: http://arxiv.org/abs/2503.11617v2
Date: Thu, 22 May 2025 09:43:27 GMT
Title: ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning
Authors: Xinyi Wang, Jiashui Wang, Jinbo Su, Ke Wang, Peng Chen, Yanming Liu, Long Liu, Xiang Li, Yangdong Wang, Qiyuan Chen, Rongze Chen, Chunfu Jia,
Abstract summary: Assembly code analysis and comprehension play critical roles in applications like reverse engineering.<n>Traditional masked language modeling approaches do not explicitly focus on natural language interaction.<n>We present Assembly Augmented Tuning, an end-to-end structural-semantic instruction tuning framework.
Score: 33.53059396922164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Assembly code analysis and comprehension play critical roles in applications like reverse engineering, yet they face substantial challenges due to low information density and a lack of explicit syntactic structures. While traditional masked language modeling (MLM) approaches do not explicitly focus on natural language interaction, emerging decoder-focused large language models (LLMs) demonstrate partial success in binary analysis yet remain underexplored for holistic comprehension. We present Assembly Augmented Tuning, an end-to-end structural-semantic instruction tuning framework that synergizes encoder architecture with decoder-based LLMs through a projector module, where the assembly encoder extracts hardware-level structural features, the projector bridges representations with the semantic space, and the instruction-tuned LLM preserves natural language capabilities. Experimental results demonstrate three key advantages: (1) State-of-the-art performance in assembly comprehension with +39.7% Recall@1 and +17.8% MRR improvements over GPT-4-Turbo, (2) Consistent enhancements across base models (24.6-107.4% Recall@1 and 15.2-106.3% MRR on Qwen2.5-Coder, Deepseek-Coder and CodeLlama variants), and (3) Superior instruction-following capabilities (41.5%-118% improvements) with controlled code generation degradation (-8.9% to -35% across architectures).

Related papers

SAFT: Structure-Aware Fine-Tuning of LLMs for AMR-to-Text Generation [50.277959544420455]
SAFT is a structure-aware fine-tuning approach that injects graph topology into pretrained language models.<n>We compute direction-sensitive positional encodings from the magnetic Laplacian of transformed AMRs.<n> SAFT sets a new state-of-the-art on AMR 3.0 with a 3.5 BLEU improvement over baselines.
arXiv Detail & Related papers (2025-07-15T18:12:57Z)
Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z)
QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization [70.3105638352827]
QwenLong-CPRS is a context compression framework designed for explicit long-context optimization.<n>QwenLong-CPRS achieves 21.59$times$ context compression alongside 19.15-point average performance gains.
arXiv Detail & Related papers (2025-05-23T16:47:00Z)
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning [9.20863636863631]
Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks.<n>We present a reinforcement learning framework that trains LLMs using Proximal Policy Optimization (PPO)<n>Our model, Qwen2.5-Coder-7B-PPO, achieves 96.4% test pass rates and an average speedup of 1.47x over the gcc -O3 baseline.
arXiv Detail & Related papers (2025-05-16T17:40:45Z)
ComplexVCoder: An LLM-Driven Framework for Systematic Generation of Complex Verilog Code [9.68747119462712]
We present ComplexVCoder, an open-source framework that enhances the generation quality and efficiency of complex Verilog code.<n>Specifically, we introduce a two-stage generation mechanism, which leverages an intermediate representation to enable a more structured transition from natural language descriptions to intricate Verilog designs.<n>In addition, we introduce a rule-based alignment method and a domain-specific retrieval-augmented generation (RAG) to further improve the correctness of the synthesized code.
arXiv Detail & Related papers (2025-04-29T11:22:06Z)
The Code Barrier: What LLMs Actually Understand? [7.407441962359689]
This research uses code obfuscation as a structured testing framework to evaluate semantic understanding capabilities of language models. Findings show a statistically significant performance decline as obfuscation complexity increases. This research introduces a new evaluation approach for assessing code comprehension in language models.
arXiv Detail & Related papers (2025-04-14T14:11:26Z)
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs [62.68905180014956]
We introduce OpenCodeInstruct, the largest open-access instruction tuning dataset, comprising 5 million diverse samples. Each sample includes a programming question, solution, test cases, execution feedback, and LLM-generated quality assessments. We fine-tune various base models, including LLaMA and Qwen, across multiple scales (1B+, 3B+, and 7B+) using our dataset.
arXiv Detail & Related papers (2025-04-05T02:52:16Z)
Post-Incorporating Code Structural Knowledge into LLMs via In-Context Learning for Code Translation [10.77747590700758]
Large language models (LLMs) have achieved significant advancements in software mining. handling the syntactic structure of source code remains a challenge. This paper employs incontext learning (ICL) to integrate code structural knowledge into pre-trained LLMs.
arXiv Detail & Related papers (2025-03-28T10:59:42Z)
Quantizing Large Language Models for Code Generation: A Differentiated Replication [51.85505914274633]
Large Language Models (LLMs) have shown an impressive capability in code generation and, specifically, to automatically implement requirements described in natural language.<n>LLMs pose significant challenges related to their memory (and, consequently, carbon) footprint.<n>New frontier for LLM quantization is 4-bit precision, resulting in an average memory footprint reduction of 70%.
arXiv Detail & Related papers (2025-03-10T09:26:08Z)
Trim My View: An LLM-Based Code Query System for Module Retrieval in Robotic Firmware [7.308605205194037]
We show how a combination of binary decomposition techniques, decompilation passes, and LLM-powered function summarization can be used to build an economical engine.<n>We instantiated this technique with three underlying open-source LLMs and measured its effectiveness in identifying modules in robotics firmware.
arXiv Detail & Related papers (2025-03-05T23:40:17Z)
EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking [55.81461218284736]
EquiBench is a new benchmark for evaluating large language models (LLMs)<n>It determines whether two programs produce identical outputs for all possible inputs.<n>We evaluate 19 state-of-the-art LLMs and find that the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline.
arXiv Detail & Related papers (2025-02-18T02:54:25Z)
Large Language Models as Code Executors: An Exploratory Study [29.545321608864295]
This paper pioneers the exploration of Large Language Models (LLMs) as code executors. We are the first to examine this feasibility across various LLMs, including OpenAI's o1, GPT-4o, GPT-3.5, DeepSeek, and Qwen-Coder. We introduce an Iterative Instruction Prompting (IIP) technique that processes code snippets line by line, enhancing the accuracy of weaker models by an average of 7.22%.
arXiv Detail & Related papers (2024-10-09T08:23:22Z)
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process. This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z)
Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation [22.219645213202178]
This paper proposes the "Semantic Chain-of-Thought" approach to intruduce semantic information of code, named SeCoT. We show that SeCoT can achieves state-of-the-art performance, greatly improving the potential for large models and code generation.
arXiv Detail & Related papers (2023-10-16T05:09:58Z)
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions. We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z)
Fixing Large Language Models' Specification Misunderstanding for Better Code Generation [13.494822086550604]
muFiX is a novel prompting technique to improve the code generation performance of large language models (LLMs)<n>It first exploits test case analysis to obtain specification understanding and enables a self-improvement process.<n>muFiX further fixes the specification understanding towards the direction reducing the gap between the provided understanding and the actual understanding.
arXiv Detail & Related papers (2023-09-28T02:58:07Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
LMs: Understanding Code Syntax and Semantics for Code Analysis [25.508254718438636]
We evaluate the capabilities of large language models (LLMs) and their limitations for code analysis in software engineering. We employ four state-of-the-art foundational models, GPT4, GPT3.5, StarCoder and CodeLlama-13b-instruct.
arXiv Detail & Related papers (2023-05-20T08:43:49Z)
CodeT5+: Open Code Large Language Models for Code Understanding and Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)
Learning Label Modular Prompts for Text Classification in the Wild [56.66187728534808]
We propose text classification in-the-wild, which introduces different non-stationary training/testing stages. Decomposing a complex task into modular components can enable robust generalisation under such non-stationary environment. We propose MODULARPROMPT, a label-modular prompt tuning framework for text classification tasks.
arXiv Detail & Related papers (2022-11-30T16:26:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.