Related papers: CodeCompass: Navigating the Navigation Paradox in Agentic Code Intelligence

CodeCompass: Navigating the Navigation Paradox in Agentic Code Intelligence

URL: http://arxiv.org/abs/2602.20048v1
Date: Mon, 23 Feb 2026 16:58:37 GMT
Title: CodeCompass: Navigating the Navigation Paradox in Agentic Code Intelligence
Authors: Tarakanath Paipuru,
Abstract summary: We identify the Navigation Paradox: agents perform poorly because navigation and retrieval are fundamentally distinct problems.<n>We demonstrate that graph-based structural navigation via Code--a Model Context Protocol server exposing dependency graphs--achieves 99.4% task completion on hidden-dependency tasks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern code intelligence agents operate in contexts exceeding 1 million tokens--far beyond the scale where humans manually locate relevant files. Yet agents consistently fail to discover architecturally critical files when solving real-world coding tasks. We identify the Navigation Paradox: agents perform poorly not due to context limits, but because navigation and retrieval are fundamentally distinct problems. Through 258 automated trials across 30 benchmark tasks on a production FastAPI repository, we demonstrate that graph-based structural navigation via CodeCompass--a Model Context Protocol server exposing dependency graphs--achieves 99.4% task completion on hidden-dependency tasks, a 23.2 percentage-point improvement over vanilla agents (76.2%) and 21.2 points over BM25 retrieval (78.2%).However, we uncover a critical adoption gap: 58% of trials with graph access made zero tool calls, and agents required explicit prompt engineering to adopt the tool consistently. Our findings reveal that the bottleneck is not tool availability but behavioral alignment--agents must be explicitly guided to leverage structural context over lexical heuristics. We contribute: (1) a task taxonomy distinguishing semantic-search, structural, and hidden-dependency scenarios; (2) empirical evidence that graph navigation outperforms retrieval when dependencies lack lexical overlap; and (3) open-source infrastructure for reproducible evaluation of navigation tools.

Related papers

Beyond Blame: Rethinking SZZ with Knowledge Graph Search [13.82629698836299]
We present AgenticSZZ, the first approach to apply Temporal Knowledge Graphs (TKGs) to software evolution analysis.<n>We show that AgenticSZZ achieves F1-scores of 0.48 to 0.74, with statistically significant improvements over state-of-the-art by up to 27%.
arXiv Detail & Related papers (2026-02-03T00:10:48Z)
OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding [57.39403818250357]
We introduce OctoBench, which benchmarks scaffold-aware instruction following in repository-grounded agentic coding.<n> OctoBench includes 34 environments and 217 tasks instantiated under three scaffold types, and is paired with 7,098 objective checklist items.<n>Experiments reveal a systematic gap between task-solving and scaffold-aware compliance, underscoring the need for training and evaluation.
arXiv Detail & Related papers (2026-01-15T12:36:08Z)
Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization [69.36509281190662]
Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck.<n>We consider using AI agents to automate this manual coding, and focus on the open question of optimal agent design.<n>We demonstrate that a simple agent framework consistently generates adaptation code that outperforms human-expert solutions.
arXiv Detail & Related papers (2025-12-02T18:42:26Z)
Agent READMEs: An Empirical Study of Context Files for Agentic Coding [8.019313057979522]
We study 2,303 agent context files from 1,925 repositories to characterize their structure, maintenance, and content.<n>We find that these files are not static documentation but complex, difficult-to-read artifacts that evolve like configuration code, maintained through frequent, small additions.<n>These findings indicate that while developers use context files to make agents functional, they provide few guardrails to ensure that agent-written code is secure or performant, highlighting the need for improved tooling and practices.
arXiv Detail & Related papers (2025-11-17T02:18:55Z)
InteractComp: Evaluating Search Agents With Ambiguous Queries [36.05005463045869]
We introduce InteractComp, a benchmark designed to evaluate whether search agents can recognize query ambiguity and actively interact to resolve it during search.<n> Evaluation of 17 models reveals striking failure: the best model achieves only 13.73% accuracy despite 71.50% with complete context.<n>This stagnation, coupled with the immediate feedback inherent to search tasks, makes InteractComp a valuable resource for both evaluating and training interaction capabilities in search agents.
arXiv Detail & Related papers (2025-10-28T17:35:54Z)
InfoAgent: Advancing Autonomous Information-Seeking Agents [143.15973604285304]
We introduce InfoAgent, a deep research agent powered by an innovative data synthesis pipeline and orchestrated web search tools.<n>With our methods, InfoAgent achieves 15.3% accuracy on BrowseComp, 29.2% on BrowseComp-ZH, and 40.4% on Xbench-DS.
arXiv Detail & Related papers (2025-09-29T17:59:57Z)
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z)
Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z)
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search [58.98450205734779]
Large language model (LLM) agents have demonstrated strong capabilities across diverse domains.<n>Existing agent search methods suffer from three major limitations.<n>We introduce a comprehensive framework to address these challenges.
arXiv Detail & Related papers (2025-06-06T12:07:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.