Related papers: Automatic Generation of a Cryptography Misuse Taxonomy Using Large Language Models

Automatic Generation of a Cryptography Misuse Taxonomy Using Large Language Models

URL: http://arxiv.org/abs/2509.10814v1
Date: Sat, 13 Sep 2025 14:28:20 GMT
Title: Automatic Generation of a Cryptography Misuse Taxonomy Using Large Language Models
Authors: Yang Zhang, Wenyi Ouyang, Yi Zhang, Liang Cheng, Chen Wu, Wenxin Hu,
Abstract summary: cryptographic API misuse is compromising the effectiveness of cryptography.<n>Despite extensive efforts to develop CAM detection tools, these tools typically rely on a limited set of predefined rules from human-curated knowledge.<n>We propose leveraging large language models (LLMs), trained on publicly available cryptography-related data, to automatically detect and classify CAMs in real-world code.
Score: 9.931896344576465
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The prevalence of cryptographic API misuse (CAM) is compromising the effectiveness of cryptography and in turn the security of modern systems and applications. Despite extensive efforts to develop CAM detection tools, these tools typically rely on a limited set of predefined rules from human-curated knowledge. This rigid, rule-based approach hinders adaptation to evolving CAM patterns in real practices. We propose leveraging large language models (LLMs), trained on publicly available cryptography-related data, to automatically detect and classify CAMs in real-world code to address this limitation. Our method enables the development and continuous expansion of a CAM taxonomy, supporting developers and detection tools in tracking and understanding emerging CAM patterns. Specifically, we develop an LLM-agnostic prompt engineering method to guide LLMs in detecting CAM instances from C/C++, Java, Python, and Go code, and then classifying them into a hierarchical taxonomy. Using a data set of 3,492 real-world software programs, we demonstrate the effectiveness of our approach with mainstream LLMs, including GPT, Llama, Gemini, and Claude. It also allows us to quantitatively measure and compare the performance of these LLMs in analyzing CAM in realistic code. Our evaluation produced a taxonomy with 279 base CAM categories, 36 of which are not addressed by existing taxonomies. To validate its practical value, we encode 11 newly identified CAM types into detection rules and integrate them into existing tools. Experiments show that such integration expands the tools' detection capabilities.

Related papers

Identifying and Mitigating API Misuse in Large Language Models [26.4403427473915]
API misuse in code generated by large language models (LLMs) represents a serious emerging challenge in software development.<n>This paper presents the first comprehensive study of API misuse patterns in LLM-generated code, analyzing both method selection and parameter usage across Python and Java.<n>We propose Dr.Fix, a novel LLM-based automatic program repair approach for API misuse based on the aforementioned taxonomy.
arXiv Detail & Related papers (2025-03-28T18:43:12Z)
CodeVision: Detecting LLM-Generated Code Using 2D Token Probability Maps and Vision Models [28.711745671275477]
The rise of large language models (LLMs) has significantly improved automated code generation, enhancing software development efficiency.<n>Existing detection methods, such as pre-trained models and watermarking, face limitations in adaptability and computational efficiency.<n>We propose a novel detection method using 2D token probability maps combined with vision models, preserving spatial code structures.
arXiv Detail & Related papers (2025-01-06T06:15:10Z)
Training of Scaffolded Language Models with Language Supervision: A Survey [62.59629932720519]
This survey organizes the literature on the design and optimization of emerging structures around post-trained LMs.<n>We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools.
arXiv Detail & Related papers (2024-10-21T18:06:25Z)
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval [80.96706764868898]
We present a new Low-light Image Enhancement (LLIE) network via Generative LAtent feature based codebook REtrieval (GLARE) We develop a generative Invertible Latent Normalizing Flow (I-LNF) module to align the LL feature distribution to NL latent representations, guaranteeing the correct code retrieval in the codebook. Experiments confirm the superior performance of GLARE on various benchmark datasets and real-world data.
arXiv Detail & Related papers (2024-07-17T09:40:15Z)
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.<n>CodeIP is a novel multi-bit watermarking technique that inserts additional information to preserve provenance details.<n>Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z)
LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs) We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z)
BroadCAM: Outcome-agnostic Class Activation Mapping for Small-scale Weakly Supervised Applications [69.22739434619531]
We propose an outcome-agnostic CAM approach, called BroadCAM, for small-scale weakly supervised applications. By evaluating BroadCAM on VOC2012 and BCSS-WSSS for WSSS and OpenImages30k for WSOL, BroadCAM demonstrates superior performance.
arXiv Detail & Related papers (2023-09-07T06:45:43Z)
Exploit CAM by itself: Complementary Learning System for Weakly Supervised Semantic Segmentation [59.24824050194334]
This paper turns to an interesting working mechanism in agent learning named Complementary Learning System ( CLS) Motivated by this simple but effective learning pattern, we propose a General-Specific Learning Mechanism (GSLM) GSLM develops a General Learning Module (GLM) and a Specific Learning Module (SLM)
arXiv Detail & Related papers (2023-03-04T16:16:47Z)
TCAM: Temporal Class Activation Maps for Object Localization in Weakly-Labeled Unconstrained Videos [22.271760669551817]
Weakly supervised object localization (WSVOL) allows object locating in videos using only global video tags as such object class. In this paper, we leverage the successful class activation mapping (CAM) methods, designed for WSOL based on still images. A new Temporal CAM (TCAM) method is introduced to train ariminant deep learning (DL) model to exploittemporal information in videos.
arXiv Detail & Related papers (2022-08-30T21:20:34Z)
F-CAM: Full Resolution CAM via Guided Parametric Upscaling [20.609010268320013]
Class Activation Mapping (CAM) methods have recently gained much attention for weakly-supervised object localization (WSOL) tasks. CAM methods are typically integrated within off-the-shelf CNN backbones, such as ResNet50. We introduce a generic method for parametric upscaling of CAMs that allows constructing accurate full resolution CAMs.
arXiv Detail & Related papers (2021-09-15T04:45:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.