Related papers: Can You Trust Your Copilot? A Privacy Scorecard for AI Coding Assistants

Can You Trust Your Copilot? A Privacy Scorecard for AI Coding Assistants

URL: http://arxiv.org/abs/2509.20388v1
Date: Mon, 22 Sep 2025 21:45:45 GMT
Title: Can You Trust Your Copilot? A Privacy Scorecard for AI Coding Assistants
Authors: Amir AL-Maamari,
Abstract summary: This paper introduces and applies a novel, expert-validated privacy scorecard.<n>The methodology involves a detailed analysis of four document types; from legal policies to external audits.<n>The results reveal a distinct hierarchy of privacy protections, with a 20-point gap between the highest- and lowest-ranked tools.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid integration of AI-powered coding assistants into developer workflows has raised significant privacy and trust concerns. As developers entrust proprietary code to services like OpenAI's GPT, Google's Gemini, and GitHub Copilot, the unclear data handling practices of these tools create security and compliance risks. This paper addresses this challenge by introducing and applying a novel, expert-validated privacy scorecard. The methodology involves a detailed analysis of four document types; from legal policies to external audits; to score five leading assistants against 14 weighted criteria. A legal expert and a data protection officer refined these criteria and their weighting. The results reveal a distinct hierarchy of privacy protections, with a 20-point gap between the highest- and lowest-ranked tools. The analysis uncovers common industry weaknesses, including the pervasive use of opt-out consent for model training and a near-universal failure to filter secrets from user prompts proactively. The resulting scorecard provides actionable guidance for developers and organizations, enabling evidence-based tool selection. This work establishes a new benchmark for transparency and advocates for a shift towards more user-centric privacy standards in the AI industry.

Related papers

Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies [57.521647436515785]
We define frontier AI auditing as rigorous third-party verification of frontier AI developers' safety and security claims.<n>We introduce AI Assurance Levels (AAL-1 to AAL-4), ranging from time-bounded system audits to continuous, deception-resilient verification.
arXiv Detail & Related papers (2026-01-16T18:44:09Z)
DRBench: A Realistic Benchmark for Enterprise Deep Research [81.49694432639406]
DRBench is a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings.<n>We release 15 deep research tasks across 10 domains, such as Sales, Cybersecurity, and Compliance.
arXiv Detail & Related papers (2025-09-30T18:47:20Z)
GitHub's Copilot Code Review: Can AI Spot Security Flaws Before You Commit? [0.0]
This study evaluates the effectiveness of GitHub Copilot's recently introduced code review feature in detecting security vulnerabilities.<n>Contrary to expectations, our results reveal that Copilot's code review frequently fails to detect critical vulnerabilities.<n>Our results highlight the continued necessity of dedicated security tools and manual code audits to ensure robust software security.
arXiv Detail & Related papers (2025-09-17T02:56:21Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
Generating Privacy Stories From Software Documentation [1.2094859111770522]
We develop a novel approach based on chain-of-thought prompting (CoT), in-context-learning (ICL), and Large Language Models (LLMs)<n>Our results show that most commonly used LLMs, such as GPT-4o and Llama 3, can identify privacy behaviors and generate privacy user stories with F1 scores exceeding 0.8.
arXiv Detail & Related papers (2025-06-28T20:55:21Z)
GOD model: Privacy Preserved AI School for Personal Assistant [3.3015224434662396]
We introduce the Guardian of Data (GOD), a secure, privacy-preserving framework for training and evaluating AI assistants on-device.<n>GOD measures how well assistants can anticipate user needs-such as suggesting gifts-while protecting user data and autonomy.
arXiv Detail & Related papers (2025-02-24T20:30:17Z)
Enhancing Security of AI-Based Code Synthesis with GitHub Copilot via Cheap and Efficient Prompt-Engineering [1.7702475609045947]
One of the reasons developers and companies avoid harnessing their full potential is the questionable security of the generated code. This paper first reviews the current state-of-the-art and identifies areas for improvement on this issue. We propose a systematic approach based on prompt-altering methods to achieve better code security of AI-based code generators such as GitHub Copilot.
arXiv Detail & Related papers (2024-03-19T12:13:33Z)
Finding Privacy-relevant Source Code [0.0]
We introduce the concept of privacy-relevant methods - specific methods in code that are directly involved in the processing of personal data. We then present an automated approach to assist in code review by identifying and categorizing these privacy-relevant methods in source code. For our evaluation, we examined 100 open-source applications and found that our approach identifies fewer than 5% of the methods as privacy-relevant for personal data processing.
arXiv Detail & Related papers (2024-01-14T15:38:29Z)
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5. We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z)
Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight. They only give tight estimates under implausible worst-case assumptions. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.