Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering
- URL: http://arxiv.org/abs/2601.14470v1
- Date: Tue, 20 Jan 2026 20:52:14 GMT
- Title: Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering
- Authors: Mohamad Salim, Jasmine Latendresse, SayedHassan Khatoonabadi, Emad Shihab,
- Abstract summary: We conduct an analysis of token consumption patterns in an LLM-MA system within the Software Development Life Cycle (SDLC)<n>We analyze execution traces from 30 software development tasks performed by the ChatDev framework using a GPT-5 reasoning model.<n>Our preliminary findings show that the iterative Code Review stage accounts for the majority of token consumption for an average of 59.4% of tokens.
- Score: 4.812321790984494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing. However, their operational efficiency and resource consumption remain poorly understood, hindering practical adoption due to unpredictable costs and environmental impact. To address this, we conduct an analysis of token consumption patterns in an LLM-MA system within the Software Development Life Cycle (SDLC), aiming to understand where tokens are consumed across distinct software engineering activities. We analyze execution traces from 30 software development tasks performed by the ChatDev framework using a GPT-5 reasoning model, mapping its internal phases to distinct development stages (Design, Coding, Code Completion, Code Review, Testing, and Documentation) to create a standardized evaluation framework. We then quantify and compare token distribution (input, output, reasoning) across these stages. Our preliminary findings show that the iterative Code Review stage accounts for the majority of token consumption for an average of 59.4% of tokens. Furthermore, we observe that input tokens consistently constitute the largest share of consumption for an average of 53.9%, providing empirical evidence for potentially significant inefficiencies in agentic collaboration. Our results suggest that the primary cost of agentic software engineering lies not in initial code generation but in automated refinement and verification. Our novel methodology can help practitioners predict expenses and optimize workflows, and it directs future research toward developing more token-efficient agent collaboration protocols.
Related papers
- From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z) - LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering [90.84806758077536]
We introduce textbfLoCoBench-Agent, a comprehensive evaluation framework specifically designed to assess large language models (LLMs) agents in realistic, long-context software engineering.<n>Our framework extends LoCoBench's 8,000 scenarios into interactive agent environments, enabling systematic evaluation of multi-turn conversations.<n>Our framework provides agents with 8 specialized tools (file operations, search, code analysis) and evaluates them across context lengths ranging from 10K to 1M tokens.
arXiv Detail & Related papers (2025-11-17T23:57:24Z) - A Process Mining-Based System For The Analysis and Prediction of Software Development Workflows [33.72751145910978]
CodeSight is an end-to-end system designed to anticipate deadline compliance in software development.<n>It captures development and deployment data directly from GitHub, transforming it into process mining logs for detailed analysis.<n>CodeSight employs an LSTM model that predicts remaining PR resolution times based on sequential activity traces and static features.
arXiv Detail & Related papers (2025-10-29T20:13:46Z) - How can we assess human-agent interactions? Case studies in software agent design [52.953425368394306]
We make two major steps towards the rigorous assessment of human-agent interactions.<n>We propose PULSE, a framework for more efficient human-centric evaluation of agent designs.<n>We deploy the framework on a large-scale web platform built around the open-source software agent OpenHands.
arXiv Detail & Related papers (2025-10-10T19:04:28Z) - A Survey on Code Generation with LLM-based Agents [61.474191493322415]
Code generation agents powered by large language models (LLMs) are revolutionizing the software development paradigm.<n>LLMs are characterized by three core features.<n>This paper presents a systematic survey of the field of LLM-based code generation agents.
arXiv Detail & Related papers (2025-07-31T18:17:36Z) - Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents [23.476042888072293]
Code large language models (CodeLLMs) and agents have shown great promise in tackling complex software engineering tasks.<n>This paper provides a comprehensive review of existing benchmarks for CodeLLMs and agents, studying and analyzing 181 benchmarks from 461 relevant papers.
arXiv Detail & Related papers (2025-05-08T14:27:45Z) - Chain of Draft for Software Engineering: Challenges in Applying Concise Reasoning to Code Tasks [0.0]
This research extends the Chain of Draft (CoD) method to software engineering.<n>All CoD variants used significantly fewer tokens than Chain of Thought (CoT)<n>CoD variants maintain over 90% of CoT's code quality across key metrics including correctness, compatibility, and maintainability.
arXiv Detail & Related papers (2025-03-12T07:44:18Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms [77.71341200638416]
ChiPBench is a benchmark designed to evaluate the effectiveness of AI-based chip placement algorithms.<n>We have gathered 20 circuits from various domains (e.g., CPU, GPU, and microcontrollers) for evaluation.<n>Results show that even if intermediate metric of a single-point algorithm is dominant, the final PPA results are unsatisfactory.
arXiv Detail & Related papers (2024-07-03T03:29:23Z) - CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology [4.2990995991059275]
Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) have transformed the field of Software Engineering.
We introduce CodePori, a novel system designed to automate code generation for large and complex software projects.
Results: CodePori is able to generate running code for large-scale projects, aligned with the typical software development process.
arXiv Detail & Related papers (2024-02-02T13:42:50Z) - Static Code Analysis in the AI Era: An In-depth Exploration of the
Concept, Function, and Potential of Intelligent Code Analysis Agents [2.8686437689115363]
We introduce the Intelligent Code Analysis Agent (ICAA), a novel concept combining AI models, engineering process designs, and traditional non-AI components.
We observed a substantial improvement in bug detection accuracy, reducing the false-positive rate to 66% from the baseline's 85%, and a promising recall rate of 60.8%.
Despite this challenge, our findings suggest that the ICAA holds considerable potential to revolutionize software quality assurance.
arXiv Detail & Related papers (2023-10-13T03:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.