Boosting Pointer Analysis With Large Language Model-Enhanced Allocation Function Detection
- URL: http://arxiv.org/abs/2509.22530v1
- Date: Fri, 26 Sep 2025 16:08:58 GMT
- Title: Boosting Pointer Analysis With Large Language Model-Enhanced Allocation Function Detection
- Authors: Baijun Cheng, Kailong Wang, Ling Shi, Haoyu Wang, Peng Di, Yao Guo, Ding Li, Xiangqun Chen,
- Abstract summary: Existing approaches largely overlook custom allocators, leading to coarse aliasing and reduced analysis precision.<n>We present AFD, a novel technique that enhances pointer analysis by automatically identifying and modeling custom allocation functions.<n>We evaluate AFD on 15 real-world C projects, identifying over 600 custom AFs.
- Score: 17.94389997355635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pointer analysis is foundational for many static analysis tasks, yet its effectiveness is often hindered by imprecise modeling of heap allocations, particularly in C/C++ programs where user-defined allocation functions (AFs) are pervasive. Existing approaches largely overlook these custom allocators, leading to coarse aliasing and reduced analysis precision. In this paper, we present AFD, a novel technique that enhances pointer analysis by automatically identifying and modeling custom allocation functions. AFD employs a hybrid approach: it uses value-flow analysis to detect straightforward wrappers and leverages Large Language Models (LLMs) to reason about more complex allocation patterns with side effects. This targeted enhancement enables precise modeling of heap objects at each call site, achieving context-sensitivity-like benefits without the associated overhead. We evaluate AFD on 15 real-world C projects, identifying over 600 custom AFs. Integrating AFD into a baseline pointer analysis yields a 26x increase in modeled heap objects and a 39% reduction in alias set sizes, with only 1.4x runtime overhead. Furthermore, our enhanced analysis improves indirect call resolution and uncovers 17 previously undetected memory bugs. These results demonstrate that precise modeling of custom allocation functions offers a scalable and practical path to improving pointer analysis in large software systems.
Related papers
- Depth Completion as Parameter-Efficient Test-Time Adaptation [66.72360181325877]
CAPA is a parameter-efficient test-time optimization framework that adapts pre-trained 3D foundation models (FMs) for depth completion.<n>For videos, CAPA introduces sequence-level parameter sharing, jointly adapting all frames to exploit temporal correlations, improve robustness, and enforce multi-frame consistency.
arXiv Detail & Related papers (2026-02-16T13:53:23Z) - Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z) - Enhancing Semantic Understanding in Pointer Analysis using Large Language Models [15.896543462798276]
We propose LMPA (LLM-enhanced Pointer Analysis), a vision that integrates LLMs into pointer analysis to enhance both precision and scalability.<n>LMPA identifies user-defined functions that resemble system APIs and models them accordingly, thereby mitigating erroneous cross-calling-context propagation.
arXiv Detail & Related papers (2025-08-29T09:37:42Z) - How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench [58.114899897566964]
In a multi-turn conversational environment, large language models (LLMs) often struggle with consistent reasoning and adherence to domain-specific policies.<n>We propose the Input-Reformulation Multi-Agent (IRMA) framework, which automatically reformulates user queries augmented with relevant domain rules.<n>IRMA significantly outperforms ReAct, Function Calling, and Self-Reflection by 16.1%, 12.7%, and 19.1%, respectively.
arXiv Detail & Related papers (2025-08-28T15:57:33Z) - Flow Sensitivity without Control Flow Graph: An Efficient Andersen-Style Flow-Sensitive Pointer Analysis [4.513381877696149]
Flow-sensitive pointer analysis is extensively used in alias analysis, taint analysis, program understanding, compiler optimization, etc.<n>Existing flow-sensitive pointer analysis approaches, which are conducted based on control flow graphs, have significantly advanced the precision of pointer analysis.<n>We present CG-FSPTA, a flow-sensitive pointer analysis to overcome the inefficiency of control-flow-graph-based analysis.
arXiv Detail & Related papers (2025-08-04T01:20:54Z) - PARF: An Adaptive Abstraction-Strategy Tuner for Static Analysis [12.761648473972873]
Parf is a toolkit for adaptively tuning abstraction strategies of static program analyzers.<n>It is implemented on top of Frama-C/Eva - an off-the-shelf open-source static analyzer for C programs.
arXiv Detail & Related papers (2025-05-19T15:13:34Z) - Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection [58.87142367781417]
A naively trained detector tends to favor overfitting to the limited and monotonous fake patterns, causing the feature space to become highly constrained and low-ranked.<n>One potential remedy is incorporating the pre-trained knowledge within the vision foundation models to expand the feature space.<n>By freezing the principal components and adapting only the remained components, we preserve the pre-trained knowledge while learning fake patterns.
arXiv Detail & Related papers (2024-11-23T19:10:32Z) - Semantic-Enhanced Indirect Call Analysis with Large Language Models [14.517268546437917]
This paper proposes Semantic-Enhanced Analysis (SEA) to enhance the effectiveness of indirect call analysis.
For common programming practices, indirect calls often exhibit semantic similarity with their invoked targets.
SEA generates natural language summaries of both indirect calls and target functions from multiple perspectives.
arXiv Detail & Related papers (2024-08-08T10:04:50Z) - LLMDFA: Analyzing Dataflow in Code with Large Language Models [8.92611389987991]
This paper presents LLMDFA, a compilation-free and customizable dataflow analysis framework.
We decompose the problem into several subtasks and introduce a series of novel strategies.
On average, LLMDFA achieves 87.10% precision and 80.77% recall, surpassing existing techniques with F1 score improvements of up to 0.35.
arXiv Detail & Related papers (2024-02-16T15:21:35Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - The Hitchhiker's Guide to Program Analysis: A Journey with Large
Language Models [18.026567399243]
Large Language Models (LLMs) offer a promising alternative to static analysis.
In this paper, we take a deep dive into the open space of LLM-assisted static analysis.
We develop LLift, a fully automated framework that interfaces with both a static analysis tool and an LLM.
arXiv Detail & Related papers (2023-08-01T02:57:43Z) - Learning Dynamic Compact Memory Embedding for Deformable Visual Object
Tracking [82.34356879078955]
We propose a compact memory embedding to enhance the discrimination of the segmentation-based deformable visual tracking method.
Our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS 2017 benchmark.
arXiv Detail & Related papers (2021-11-23T03:07:12Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.