Related papers: CASCADE: LLM-Powered JavaScript Deobfuscator at Google

CASCADE: LLM-Powered JavaScript Deobfuscator at Google

URL: http://arxiv.org/abs/2507.17691v1
Date: Wed, 23 Jul 2025 16:57:32 GMT
Title: CASCADE: LLM-Powered JavaScript Deobfuscator at Google
Authors: Shan Jiang, Pranoy Kovuri, David Tao, Zhixun Tan,
Abstract summary: Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis.<n>This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler.<n>CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency.
Score: 1.7266435334810277
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.

Related papers

JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation [34.88009582470047]
Large Language Models (LLMs) have recently shown promise in automating the deobfuscation process.<n>We present JsDeObsBench, a benchmark designed to rigorously evaluate the effectiveness of LLMs in the context of JS deobfuscation.
arXiv Detail & Related papers (2025-06-25T06:50:13Z)
Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Simplicity by Obfuscation: Evaluating LLM-Driven Code Transformation with Semantic Elasticity [4.458584890504334]
Code obfuscation aims to prevent reverse engineering and intellectual property theft.<n>The recent development of large language models paves the way for practical applications in different domains.<n>This work performs an empirical study on the ability of LLMs to obfuscate Python source code.
arXiv Detail & Related papers (2025-04-18T18:29:23Z)
The Code Barrier: What LLMs Actually Understand? [7.407441962359689]
This research uses code obfuscation as a structured testing framework to evaluate semantic understanding capabilities of language models.<n>Findings show a statistically significant performance decline as obfuscation complexity increases.<n>This research introduces a new evaluation approach for assessing code comprehension in language models.
arXiv Detail & Related papers (2025-04-14T14:11:26Z)
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding [60.37988508851391]
Language models (LMs) have become a staple of the code-writing toolbox.<n>Research exploring modifications to Code-LMs' pre-training objectives, geared towards improving data efficiency and better disentangling between syntax and semantics, has been noticeably sparse.<n>In this work, we examine grounding on obfuscated code as a means of helping Code-LMs look beyond the surface-form syntax and enhance their pre-training sample efficiency.
arXiv Detail & Related papers (2025-03-27T23:08:53Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z)
Cryptic Bytes: WebAssembly Obfuscation for Evading Cryptojacking Detection [0.0]
We present the most comprehensive evaluation of code obfuscation techniques for WebAssembly to date. We obfuscate a diverse set of applications, including utilities, games, and crypto miners, using state-of-the-art obfuscation tools like Tigress and wasm-mutate. Our dataset of over 20,000 obfuscated WebAssembly binaries and the emcc-obf tool publicly available to stimulate further research.
arXiv Detail & Related papers (2024-03-22T13:32:08Z)
JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models [53.83273575102087]
We propose an unsupervised inference-time approach to authorship obfuscation. We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation. Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs.
arXiv Detail & Related papers (2024-02-13T19:54:29Z)
Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics. We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.