Related papers: Repository Intelligence Graph: Deterministic Architectural Map for LLM Code Assistants

Repository Intelligence Graph: Deterministic Architectural Map for LLM Code Assistants

URL: http://arxiv.org/abs/2601.10112v1
Date: Thu, 15 Jan 2026 06:42:45 GMT
Title: Repository Intelligence Graph: Deterministic Architectural Map for LLM Code Assistants
Authors: Tsvi Cherny-Shahar, Amiram Yehudai,
Abstract summary: Repository aware coding agents often struggle to recover build and test structure.<n>We introduce the Repository Intelligence Graph (RIG), a deterministic, evidence backed architectural map that represents buildable components, aggregators, runners, tests, external packages, and package managers.<n>We evaluate three commercial agents (Claude Code, Cursor, Codex) on eight repositories spanning low to high build complexity, including the real world MetaFFI project.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Repository aware coding agents often struggle to recover build and test structure, especially in multilingual projects where cross language dependencies are encoded across heterogeneous build systems and tooling. We introduce the Repository Intelligence Graph (RIG), a deterministic, evidence backed architectural map that represents buildable components, aggregators, runners, tests, external packages, and package managers, connected by explicit dependency and coverage edges that trace back to concrete build and test definitions. We also present SPADE, a deterministic extractor that constructs RIG from build and test artifacts (currently with an automatic CMake plugin based on the CMake File API and CTest metadata), and exposes RIG as an LLM friendly JSON view that agents can treat as the authoritative description of repository structure. We evaluate three commercial agents (Claude Code, Cursor, Codex) on eight repositories spanning low to high build oriented complexity, including the real world MetaFFI project. Each agent answers thirty structured questions per repository with and without RIG in context, and we measure accuracy, wall clock completion time, and efficiency (seconds per correct answer). Across repositories and agents, providing RIG improves mean accuracy by 12.2\% and reduces completion time by 53.9\%, yielding a mean 57.8\% reduction in seconds per correct answer. Gains are larger in multilingual repositories, which improve by 17.7\% in accuracy and 69.5\% in efficiency on average, compared to 6.6\% and 46.1\% in single language repositories. Qualitative analysis suggests that RIG shifts failures from structural misunderstandings toward reasoning mistakes over a correct structure, while rare regressions highlight that graph based reasoning quality remains a key factor.

Related papers

GREPO: A Benchmark for Graph Neural Networks on Repository-Level Bug Localization [50.009407518866965]
Repository-level bug localization is a critical software engineering challenge.<n>GNNs offer a promising alternative due to their ability to model complex, repository-wide dependencies.<n>We introduce GREPO, the first GNN benchmark for repository-scale bug localization tasks.
arXiv Detail & Related papers (2026-02-14T23:22:15Z)
ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development [49.63491095660809]
ProjDevBench is an end-to-end benchmark that provides project requirements to coding agents and evaluates the resulting repositories.<n>We curate 20 programming problems across 8 categories, covering both concept-oriented tasks and real-world application scenarios.<n>Our evaluation reports an overall acceptance rate of 27.38%: agents handle basic functionality but struggle with complex system design, time optimization, and resource management.
arXiv Detail & Related papers (2026-02-02T05:17:23Z)
ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs [44.137226823695066]
ArchAgent is a scalable agent-based framework that combines static analysis, adaptive code segmentation, and LLM-powered synthesis.<n>It reconstructs multiview, business-aligned architectures from cross-repositorys.<n>ArchAgent introduces scalable diagram generation with contextual pruning and integrates cross-repository data to identify business-critical modules.
arXiv Detail & Related papers (2026-01-19T12:39:05Z)
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents [79.29376673236142]
Existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems.<n>We present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents.
arXiv Detail & Related papers (2025-12-14T15:12:13Z)
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z)
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z)
Git Context Controller: Manage the Context of LLM-based Agents like Git [6.521644491529639]
Large language model (LLM) based agents have shown impressive capabilities by interleaving internal reasoning with external tool use.<n>We introduce Git-Context-Controller (GCC), a structured context management framework inspired by software version control systems.<n>In a self-replication case study, a GCC-augmented agent builds a new CLI agent from scratch, achieving 40.7 task resolution, compared to only 11.7 without GCC.
arXiv Detail & Related papers (2025-07-30T08:01:45Z)
Enhancing repository-level software repair via repository-aware knowledge graphs [13.747293341707563]
Repository-level software repair faces challenges in bridging semantic gaps between issue descriptions and code patches.<n>Existing approaches, which rely on large language models (LLMs), are hindered by semantic ambiguities, limited understanding of structural context, and insufficient reasoning capabilities.<n>We propose a novel repository-aware knowledge graph (KG) that accurately links repository artifacts (issues and pull requests) and entities (files, classes, and functions)<n>A path-guided repair mechanism that leverages KG-mined paths, tracing through which allows us to augment contextual information along with explanations.
arXiv Detail & Related papers (2025-03-27T17:21:47Z)
DependEval: Benchmarking LLMs for Repository Dependency Understanding [16.19185341217556]
Large language models (LLMs) have shown considerable promise in code generation, real-world software development demands advanced repository-level reasoning.<n>We introduce a hierarchical benchmark designed to evaluate repository dependency understanding (DependEval)<n> Benchmark is based on 15,576 repositories collected from real-world websites.
arXiv Detail & Related papers (2025-03-09T16:45:22Z)
RustRepoTrans: Repository-level Code Translation Benchmark Targeting Rust [50.65321080814249]
RustRepoTrans is the first repository-level context code translation benchmark targeting incremental translation.<n>We evaluate seven representative LLMs, analyzing their errors to assess limitations in complex translation scenarios.
arXiv Detail & Related papers (2024-11-21T10:00:52Z)
On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z)
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository [4.767858874370881]
We introduce RepoClassBench, a benchmark designed to rigorously evaluate LLMs in generating class-level code within real-world repositories. RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories. We introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context.
arXiv Detail & Related papers (2024-04-22T03:52:54Z)
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process. It incorporates a similarity-based retriever and a pre-trained code language model. It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.