From base cases to backdoors: An Empirical Study of Unnatural Crypto-API Misuse
- URL: http://arxiv.org/abs/2510.13102v1
- Date: Wed, 15 Oct 2025 02:45:14 GMT
- Title: From base cases to backdoors: An Empirical Study of Unnatural Crypto-API Misuse
- Authors: Victor Olaiya, Adwait Nadkarni,
- Abstract summary: This paper presents the first large-scale study that characterizes unnatural crypto-API usage.<n>We develop an intuitive complexity metric to stratify 140,431 crypto-API invocations obtained from 20,508 Android applications.<n>We qualitatively analyze the 5,704 sampled invocations using manual reverse engineering.
- Score: 5.5833263953494665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tools focused on cryptographic API misuse often detect the most basic expressions of the vulnerable use, and are unable to detect non-trivial variants. The question of whether tools should be designed to detect such variants can only be answered if we know how developers use and misuse cryptographic APIs in the wild, and in particular, what the unnatural usage of such APIs looks like. This paper presents the first large-scale study that characterizes unnatural crypto-API usage through a qualitative analysis of 5,704 representative API invocations. We develop an intuitive complexity metric to stratify 140,431 crypto-API invocations obtained from 20,508 Android applications, allowing us to sample 5,704 invocations that are representative of all strata, with each stratum consisting of invocations with similar complexity/naturalness. We qualitatively analyze the 5,704 sampled invocations using manual reverse engineering, through an in-depth investigation that involves the development of minimal examples and exploration of native code. Our study results in two detailed taxonomies of unnatural crypto-API misuse, along with 17 key findings that show the presence of highly unusual misuse, evasive code, and the inability of popular tools to reason about even mildly unconventional usage. Our findings lead to four key takeaways that inform future work focused on detecting unnatural crypto-API misuse.
Related papers
- Framework-Aware Code Generation with API Knowledge Graph-Constructed Data: A Study on HarmonyOS [52.483888557864326]
APIKG4SYN is a framework designed to exploit API knowledge graphs for the construction of API-oriented question-code pairs.<n>We build the first benchmark for HarmonyOS code generation using APIKG4SYN.
arXiv Detail & Related papers (2025-11-29T08:13:54Z) - RINSER: Accurate API Prediction Using Masked Language Models [21.081906052711172]
We present RINSER, an automated framework for predicting Windows API (WinAPI) function names.<n>RINSER relies on BERT's masked language model (LM) to predict API names at scale.<n>We evaluate RINSER on a large dataset of 4.7M API codeprints from 11,098 malware binaries, covering 4,123 unique Windows APIs.
arXiv Detail & Related papers (2025-09-05T08:08:11Z) - Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.<n>Traditional fuzzing struggles with the complexity and API diversity of DL libraries.<n>We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z) - ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration [70.26807758443675]
ExploraCoder is a training-free framework that empowers large language models to invoke unseen APIs in code solution.<n> Experimental results demonstrate that ExploraCoder significantly improves performance for models lacking prior API knowledge.
arXiv Detail & Related papers (2024-12-06T19:00:15Z) - RoBERTa-Augmented Synthesis for Detecting Malicious API Requests [9.035212370386846]
We introduce a GAN-inspired learning framework that extends limited API traffic datasets through targeted, domain-aware synthesis.<n>We evaluate our framework on two benchmark datasets, CSIC 2010 and ATRDF 2023, and compare it with a previous data augmentation technique.<n>Our method achieves up to a 4.94% increase in F1 score on CSIC 2010 and up to 21.10% on ATRDF 2023.
arXiv Detail & Related papers (2024-05-18T11:10:45Z) - A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks [9.693391036125908]
We propose a novel unsupervised few-shot anomaly detection framework composed of two main parts.
First, we train a dedicated generic language model for API based on FastText embedding.
Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach.
arXiv Detail & Related papers (2024-05-18T10:15:31Z) - RESTRuler: Towards Automatically Identifying Violations of RESTful
Design Rules in Web APIs [3.4711214580685557]
We present RESTRuler, a Java-based open-source tool that uses static analysis to detect design rule violations in OpenAPI descriptions.
For robustness, RESTRuler successfully analyzed 99% of the used real-world OpenAPI definitions.
For performance efficiency, the tool performed well for the majority of files and could analyze 84% in less than 23 seconds with low CPU and RAM usage.
arXiv Detail & Related papers (2024-02-21T11:25:22Z) - MASC: A Tool for Mutation-Based Evaluation of Static Crypto-API Misuse
Detectors [16.62222783321419]
This demo paper presents the technical details and usage scenarios of our tool, namely Mutation Analysis for evaluating Static Crypto-API misuse detectors (MASC)
We developed $12$ generalizable, usage based mutation operators and three mutation scopes, namely Main Scope, Similarity Scope, and Exhaustive Scope, which can be used to expressively instantiate compilable variants of the crypto-API misuse cases.
MASC comes with both Command Line Interface and Web-based front-end, making it practical for users of different levels of expertise.
arXiv Detail & Related papers (2023-08-04T13:22:22Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - Simple Transparent Adversarial Examples [65.65977217108659]
We introduce secret embedding and transparent adversarial examples as a simpler way to evaluate robustness.
As a result, they pose a serious threat where APIs are used for high-stakes applications.
arXiv Detail & Related papers (2021-05-20T11:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.