Protect Your Secrets: Understanding and Measuring Data Exposure in VSCode Extensions
- URL: http://arxiv.org/abs/2412.00707v2
- Date: Wed, 25 Dec 2024 06:49:44 GMT
- Title: Protect Your Secrets: Understanding and Measuring Data Exposure in VSCode Extensions
- Authors: Yue Liu, Chakkrit Tantithamthavorn, Li Li,
- Abstract summary: We investigate on the security issues of cross-extension interactions in Visual Studio Code (VSCode)
Our study uncovers high-impact security flaws that could allow adversaries to stealthily acquire or manipulate credential-related data.
By applying our tool to 27,261 real-world VSCode extensions, we discover that 8.5% of them are exposed to credential-related data leakage.
- Score: 14.381954681512644
- License:
- Abstract: Recent years have witnessed the emerging trend of extensions in modern Integrated Development Environments (IDEs) like Visual Studio Code (VSCode) that significantly enhance developer productivity. Especially, popular AI coding assistants like GitHub Copilot and Tabnine provide conveniences like automated code completion and debugging. While these extensions offer numerous benefits, they may introduce privacy and security concerns to software developers. However, there is no existing work that systematically analyzes the security and privacy concerns, including the risks of data exposure in VSCode extensions. In this paper, we investigate on the security issues of cross-extension interactions in VSCode and shed light on the vulnerabilities caused by data exposure among different extensions. Our study uncovers high-impact security flaws that could allow adversaries to stealthily acquire or manipulate credential-related data (e.g., passwords, API keys, access tokens) from other extensions if not properly handled by extension vendors. To measure their prevalence, we design a novel automated risk detection framework that leverages program analysis and natural language processing techniques to automatically identify potential risks in VSCode extensions. By applying our tool to 27,261 real-world VSCode extensions, we discover that 8.5% of them (i.e., 2,325 extensions) are exposed to credential-related data leakage through various vectors, such as commands, user input, and configurations. Our study sheds light on the security challenges and flaws of the extension-in-IDE paradigm and provides suggestions and recommendations for improving the security of VSCode extensions and mitigating the risks of data exposure.
Related papers
- RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - Developers Are Victims Too : A Comprehensive Analysis of The VS Code Extension Ecosystem [2.6912894078100282]
We analyzed 52,880 third-party VS Code extensions to understand their threat to the developer, the code, and the development organizations.
We found that 5.6% of the analyzed extensions have suspicious behavior, jeopardizing the integrity of the development environment and potentially leaking sensitive information on the developer's product.
arXiv Detail & Related papers (2024-11-12T01:52:25Z) - Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries [2.696054049278301]
We introduce DeBinVul, a novel decompiled binary code vulnerability dataset.
We fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in detecting binary code vulnerabilities.
arXiv Detail & Related papers (2024-11-07T18:54:31Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - Unintentional Security Flaws in Code: Automated Defense via Root Cause Analysis [2.899501205987888]
We developed an automated vulnerability root cause (RC) toolkit called T5-RCGCN.
It combines T5 language model embeddings with a graph convolutional network (GCN) for vulnerability classification and localization.
We tested T5-RCGCN with 56 junior developers across three datasets, showing a 28.9% improvement in code security compared to previous methods.
arXiv Detail & Related papers (2024-08-30T18:26:59Z) - Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair.
However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities.
We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z) - FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques [53.288368877654705]
FV8 is a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code.
It selectively enforces code execution on APIs that conditionally inject dynamic code.
It identifies 1,443 npm packages and 164 (82%) extensions containing at least one type of evasion.
arXiv Detail & Related papers (2024-05-21T19:54:19Z) - CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z) - LLM-Powered Code Vulnerability Repair with Reinforcement Learning and
Semantic Reward [3.729516018513228]
We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2.
Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs.
We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
arXiv Detail & Related papers (2024-01-07T02:46:39Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.