CIPHER: Cryptographic Insecurity Profiling via Hybrid Evaluation of Responses
- URL: http://arxiv.org/abs/2602.01438v2
- Date: Thu, 05 Feb 2026 23:00:34 GMT
- Title: CIPHER: Cryptographic Insecurity Profiling via Hybrid Evaluation of Responses
- Authors: Max Manolov, Tony Gao, Siddharth Shukla, Cheng-Ting Chou, Ryan Lagasse,
- Abstract summary: We introduce CIPHER, a benchmark for measuring cryptographic vulnerability incidence in Python code.<n> CIPHER uses insecure/neutral/secure prompt variants per task, a cryptography-specific vulnerability taxonomy, and line-level attribution.<n>We find that explicit secure prompting reduces some targeted issues but does not reliably eliminate cryptographic vulnerabilities overall.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly used to assist developers with code, yet their implementations of cryptographic functionality often contain exploitable flaws. Minor design choices (e.g., static initialization vectors or missing authentication) can silently invalidate security guarantees. We introduce CIPHER(Cryptographic Insecurity Profiling via Hybrid Evaluation of Responses), a benchmark for measuring cryptographic vulnerability incidence in LLM-generated Python code under controlled security-guidance conditions. CIPHER uses insecure/neutral/secure prompt variants per task, a cryptography-specific vulnerability taxonomy, and line-level attribution via an automated scoring pipeline. Across a diverse set of widely used LLMs, we find that explicit secure prompting reduces some targeted issues but does not reliably eliminate cryptographic vulnerabilities overall. The benchmark and reproducible scoring pipeline will be publicly released upon publication.
Related papers
- RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z) - AgenticSCR: An Autonomous Agentic Secure Code Review for Immature Vulnerabilities Detection [8.909533914802669]
We introduce AgenticSCR, an agentic AI for secure code review for detecting immature vulnerabilities during the pre-commit stage.<n>We empirically evaluate how accurate is AgenticSCR for localizing, detecting, and explaining immature vulnerabilities.
arXiv Detail & Related papers (2026-01-27T03:10:12Z) - "MCP Does Not Stand for Misuse Cryptography Protocol": Uncovering Cryptographic Misuse in Model Context Protocol at Scale [27.85822797774986]
The Model Context Protocol (MCP) is emerging as the interface for tool integration.<n>MCP provides no guarantees of authenticity or confidentiality, forcing developers to implement cryptography themselves.<n>We presentYSCOPE, the first domain-specific framework for detecting cryptographic misuses in MCP implementations.<n>Our study establishes the first ecosystem-wide view of cryptographic misuse in MCP and provides both tools and insights to strengthen the security foundations of this rapidly growing protocol.
arXiv Detail & Related papers (2025-12-03T13:25:59Z) - Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs [0.0]
We present Contextual Integrity Verification (CIV), a provenance-time security architecture that attaches cryptographic inferenceally signed labels to every token.<n>CIV provides provenance, per-token non-interference guarantees on frozen models.<n>We demonstrate drop-in protection for Llama-3-8B and Mistral-7B.
arXiv Detail & Related papers (2025-08-12T18:47:30Z) - Cryptanalysis of LC-MUME: A Lightweight Certificateless Multi-User Matchmaking Encryption for Mobile Devices [0.0]
We show that a Type-I adversary can successfully forge a validtext cipher without possessing the complete private key of the sender.<n>We propose a strategy to strengthen the security of matchmaking encryption schemes in mobile computing environments.
arXiv Detail & Related papers (2025-07-30T13:36:52Z) - Guiding AI to Fix Its Own Flaws: An Empirical Study on LLM-Driven Secure Code Generation [16.29310628754089]
Large Language Models (LLMs) have become powerful tools for automated code generation.<n>LLMs often overlook critical security practices, which can result in the generation of insecure code.<n>This paper examines their inherent tendencies to produce insecure code, their capability to generate secure code when guided by self-generated vulnerability hints, and their effectiveness in repairing vulnerabilities when provided with different levels of feedback.
arXiv Detail & Related papers (2025-06-28T23:24:33Z) - One Trigger Token Is Enough: A Defense Strategy for Balancing Safety and Usability in Large Language Models [20.42976162135529]
Large Language Models (LLMs) have been extensively used across diverse domains, including virtual assistants, automated code generation, and scientific research.<n>We propose textttD-STT, a simple yet effective defense algorithm that identifies and explicitly decodes safety trigger tokens of the given safety-aligned LLM.
arXiv Detail & Related papers (2025-05-12T01:26:50Z) - Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories [8.583591493627276]
We introduce JitVul, a vulnerability detection benchmark linking each function to its vulnerability-introducing and fixing commits.<n>We show that ReAct Agents, leveraging thought-action-observation and interprocedural context, perform better than LLMs in distinguishing vulnerable from benign code.
arXiv Detail & Related papers (2025-03-05T15:22:24Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - Cryptanalysis via Machine Learning Based Information Theoretic Metrics [58.96805474751668]
We propose two novel applications of machine learning (ML) algorithms to perform cryptanalysis on any cryptosystem.<n>These algorithms can be readily applied in an audit setting to evaluate the robustness of a cryptosystem.<n>We show that our classification model correctly identifies the encryption schemes that are not IND-CPA secure, such as DES, RSA, and AES ECB, with high accuracy.
arXiv Detail & Related papers (2025-01-25T04:53:36Z) - CodeChameleon: Personalized Encryption Framework for Jailbreaking Large
Language Models [49.60006012946767]
We propose CodeChameleon, a novel jailbreak framework based on personalized encryption tactics.
We conduct extensive experiments on 7 Large Language Models, achieving state-of-the-art average Attack Success Rate (ASR)
Remarkably, our method achieves an 86.6% ASR on GPT-4-1106.
arXiv Detail & Related papers (2024-02-26T16:35:59Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - Coding-Based Hybrid Post-Quantum Cryptosystem for Non-Uniform Information [53.85237314348328]
We introduce for non-uniform messages a novel hybrid universal network coding cryptosystem (NU-HUNCC)
We show that NU-HUNCC is information-theoretic individually secured against an eavesdropper with access to any subset of the links.
arXiv Detail & Related papers (2024-02-13T12:12:39Z) - Certifying LLM Safety against Adversarial Prompting [70.96868018621167]
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious tokens to an input prompt.<n>We introduce erase-and-check, the first framework for defending against adversarial prompts with certifiable safety guarantees.
arXiv Detail & Related papers (2023-09-06T04:37:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.