FastCode: Fast and Cost-Efficient Code Understanding and Reasoning
- URL: http://arxiv.org/abs/2603.01012v2
- Date: Tue, 03 Mar 2026 10:18:10 GMT
- Title: FastCode: Fast and Cost-Efficient Code Understanding and Reasoning
- Authors: Zhonghang Li, Zongwei Li, Yuxuan Chen, Han Shi, Jiawei Li, Jierun Chen, Haoli Bai, Chao Huang,
- Abstract summary: Repository-scale code reasoning is a cornerstone of modern AI-assisted software engineering.<n>FastCode is a framework that decouples repository exploration from content consumption.
- Score: 32.264145740214616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Repository-scale code reasoning is a cornerstone of modern AI-assisted software engineering, enabling Large Language Models (LLMs) to handle complex workflows from program comprehension to complex debugging. However, balancing accuracy with context cost remains a significant bottleneck, as existing agentic approaches often waste computational resources through inefficient, iterative full-text exploration. To address this, we introduce FastCode, a framework that decouples repository exploration from content consumption. FastCode utilizes a structural scouting mechanism to navigate a lightweight semantic-structural map of the codebase, allowing the system to trace dependencies and pinpoint relevant targets without the overhead of full-text ingestion. By leveraging structure-aware navigation tools regulated by a cost-aware policy, the framework constructs high-value contexts in a single, optimized step. Extensive evaluations on the SWE-QA, LongCodeQA, LOC-BENCH, and GitTaskBench benchmarks demonstrate that FastCode consistently outperforms state-of-the-art baselines in reasoning accuracy while significantly reducing token consumption, validating the efficiency of scouting-first strategies for large-scale code reasoning. Source code is available at https://github.com/HKUDS/FastCode.
Related papers
- Multi-CoLoR: Context-Aware Localization and Reasoning across Multi-Language Codebases [1.4216413758677147]
We present Multi-CoLoR, a framework for Context-aware localization and reasoning across Multi-Languages.<n>It integrates organizational knowledge retrieval with graph-based reasoning to traverse complex software ecosystems.
arXiv Detail & Related papers (2026-02-23T00:54:59Z) - LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation [86.08600027874662]
We propose LogitsCoder, a novel framework that enhances chain-of-thought reasoning through lightweight, logit-level control mechanisms for code generation.<n>We show that LogitsCoder produces more efficient and higher-quality reasoning chains, leading to superior code generation performance compared to baseline methods.
arXiv Detail & Related papers (2026-02-15T08:52:19Z) - AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion [55.21541958868449]
We propose AlignCoder, a repository-level code completion framework.<n>Our framework generates an enhanced query that bridges the semantic gap between the initial query and the target code.<n>We employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval.
arXiv Detail & Related papers (2026-01-27T15:23:14Z) - SpecMap: Hierarchical LLM Agent for Datasheet-to-Code Traceability Link Recovery in Systems Engineering [8.235446273226277]
Traceability between embedded systemss and their corresponding code implementations is a fundamental challenge in systems engineering.<n>Existing Traceability Link Recovery approaches rely on lexical similarity and information retrieval techniques.<n>We present a hierarchical-to-code mapping methodology that employs large language models for semantic analysis.
arXiv Detail & Related papers (2026-01-16T11:50:18Z) - ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development [72.4729759618632]
We introduce ABC-Bench, a benchmark to evaluate agentic backend coding within a realistic, executable workflow.<n>We curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.<n>Our evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks.
arXiv Detail & Related papers (2026-01-16T08:23:52Z) - Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding [37.78627994991325]
CoCo is a novel framework that enables code Completion by of multi-granularity context from large-scale code repositories.<n>Experiments on CrossCodeEval and RepoEval benchmarks demonstrate that CoCo consistently surpasses state-of-the-art baselines.
arXiv Detail & Related papers (2025-12-04T07:37:59Z) - Fast Thinking for Large Language Models [67.7238685892317]
We introduce Latent Codebooks for Fast Thinking, a framework that uses concise CoT sketches only during training to learn a codebook of discrete strategy priors.<n>At inference, the model conditions on a handful of continuous thinking switches distilled from the codebook in a single pass, enabling strategy-level guidance without producing explicit reasoning tokens.
arXiv Detail & Related papers (2025-09-28T04:19:48Z) - ReCode: Improving LLM-based Code Repair with Fine-Grained Retrieval-Augmented Generation [26.836612605244596]
We propose ReCode, a fine-grained retrieval-augmented in-context learning framework for accurate and efficient code repair.<n>ReCode introduces two key innovations: (1) an algorithm-aware retrieval strategy that narrows the search space using preliminary algorithm type predictions; and (2) a modular dual-encoder architecture that separately processes code and textual inputs.<n> Experimental results on RACodeBench and competitive programming datasets demonstrate that ReCode achieves higher repair accuracy with significantly reduced inference cost.
arXiv Detail & Related papers (2025-09-02T13:58:48Z) - CoRe: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks [14.408364047538578]
Large language models (LLMs) have been widely adopted across diverse domains of software engineering.<n>This work presents CORE, a benchmark designed to evaluate LLMs on fundamental static analysis tasks.
arXiv Detail & Related papers (2025-07-03T01:35:58Z) - Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs [53.00384299879513]
In large language models (LLMs), code and reasoning reinforce each other.<n>Code provides verifiable execution paths, enforces logical decomposition, and enables runtime validation.<n>We identify key challenges and propose future research directions to strengthen this synergy.
arXiv Detail & Related papers (2025-02-26T18:55:42Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Understanding Long Programming Languages with Structure-Aware Sparse
Attention [32.21325784213584]
We present SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks.
The key components in SASA are top-$k$ sparse attention and Abstract Syntax Tree (AST)-based structure-aware attention.
Experiments on CodeXGLUE tasks show that SASA achieves better performance than the competing baselines.
arXiv Detail & Related papers (2022-05-27T02:50:57Z) - Precise Learning of Source Code Contextual Semantics via Hierarchical
Dependence Structure and Graph Attention Networks [28.212889828892664]
We propose a novel source code model embedded with hierarchical dependencies.
We introduce the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information.
The results show that our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task.
arXiv Detail & Related papers (2021-11-20T04:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.