Unbundle-Rewrite-Rebundle: Runtime Detection and Rewriting of Privacy-Harming Code in JavaScript Bundles
- URL: http://arxiv.org/abs/2405.00596v2
- Date: Tue, 7 May 2024 15:38:20 GMT
- Title: Unbundle-Rewrite-Rebundle: Runtime Detection and Rewriting of Privacy-Harming Code in JavaScript Bundles
- Authors: Mir Masood Ali, Peter Snyder, Chris Kanich, Hamed Haddadi,
- Abstract summary: Unbundle-Rewrite-Rebundle (URR) is a system for detecting privacy-harming portions of bundled JavaScript code.
URR rewrites that code at runtime to remove the privacy harming behavior without breaking the surrounding code or overall application.
- Score: 11.832746335723437
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work presents Unbundle-Rewrite-Rebundle (URR), a system for detecting privacy-harming portions of bundled JavaScript code, and rewriting that code at runtime to remove the privacy harming behavior without breaking the surrounding code or overall application. URR is a novel solution to the problem of JavaScript bundles, where websites pre-compile multiple code units into a single file, making it impossible for content filters and ad-blockers to differentiate between desired and unwanted resources. Where traditional content filtering tools rely on URLs, URR analyzes the code at the AST level, and replaces harmful AST sub-trees with privacy-and-functionality maintaining alternatives. We present an open-sourced implementation of URR as a Firefox extension, and evaluate it against JavaScript bundles generated by the most popular bundling system (Webpack) deployed on the Tranco 10k. We measure the performance, measured by precision (1.00), recall (0.95), and speed (0.43s per-script) when detecting and rewriting three representative privacy harming libraries often included in JavaScript bundles, and find URR to be an effective approach to a large-and-growing blind spot unaddressed by current privacy tools.
Related papers
- GHunter: Universal Prototype Pollution Gadgets in JavaScript Runtimes [5.852467142337343]
Prototype pollution is a recent vulnerability that affects JavaScript code.
It is rooted in JavaScript's prototype-based inheritance, enabling attackers to inject arbitrary properties into an object's prototype at runtime.
We study gadgets in V8-based JavaScript runtimes with prime focus on Node.js and Deno.
arXiv Detail & Related papers (2024-07-15T15:30:00Z) - Blocking Tracking JavaScript at the Function Granularity [15.86649576818013]
Not.js is a fine grained JavaScript blocking tool that operates at the function level granularity.
Not.js trains a supervised machine learning classifier on a webpage's graph representation to first detect tracking at the JavaScript function level.
Not.js then automatically generates surrogate scripts that preserve functionality while removing tracking.
arXiv Detail & Related papers (2024-05-28T17:26:57Z) - FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques [53.288368877654705]
FV8 is a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code.
It selectively enforces code execution on APIs that conditionally inject dynamic code.
It identifies 1,443 npm packages and 164 (82%) extensions containing at least one type of evasion.
arXiv Detail & Related papers (2024-05-21T19:54:19Z) - The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z) - Code-Based Single-Server Private Information Retrieval: Circumventing the Sub-Query Attack [9.054540533394928]
modified version of the first code-based single-server computational PIR scheme proposed by Holzbaur, Hollanti, and Wachter-Zeh.
In the case of retrieving multiple files, the rate of the modified scheme is largely unaffected and at par with the original scheme.
arXiv Detail & Related papers (2024-02-05T10:37:26Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - JavaScript Dead Code Identification, Elimination, and Empirical
Assessment [13.566269406958966]
We present Lacuna, an approach for automatically detecting and eliminating JavaScript dead code from web apps.
We conduct an experiment to empirically evaluate the run-time overhead of JavaScript dead code in terms of energy consumption, performance, network usage, and resource usage in the context of mobile web apps.
arXiv Detail & Related papers (2023-08-31T13:48:39Z) - ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated
Learning Based on Coded Computing and Vector Commitment [90.60126724503662]
ByzSecAgg is an efficient secure aggregation scheme for federated learning.
ByzSecAgg is protected against Byzantine attacks and privacy leakages.
arXiv Detail & Related papers (2023-02-20T11:15:18Z) - SPAct: Self-supervised Privacy Preservation for Action Recognition [73.79886509500409]
Existing approaches for mitigating privacy leakage in action recognition require privacy labels along with the action labels from the video dataset.
Recent developments of self-supervised learning (SSL) have unleashed the untapped potential of the unlabeled data.
We present a novel training framework which removes privacy information from input video in a self-supervised manner without requiring privacy labels.
arXiv Detail & Related papers (2022-03-29T02:56:40Z) - Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics.
We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.