What Challenges Do Developers Face in AI Agent Systems? An Empirical Study on Stack Overflow
- URL: http://arxiv.org/abs/2510.25423v1
- Date: Wed, 29 Oct 2025 11:44:21 GMT
- Title: What Challenges Do Developers Face in AI Agent Systems? An Empirical Study on Stack Overflow
- Authors: Ali Asgari, Annibale Panichella, Pouria Derakhshanfar, Mitchell Olsthoorn,
- Abstract summary: We study developer discussions on Stack Overflow, the world's largest developer-focused Q and A platform.<n>We construct a taxonomy of developer challenges through tag expansion and filtering, apply LDA-MALLET for topic modeling, and manually validate and label the resulting themes.<n>Our analysis reveals seven major areas of recurring issues encompassing 77 distinct technical challenges related to runtime integration, dependency management, orchestration complexity, and evaluation reliability.
- Score: 12.179548969182571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI agents have rapidly gained popularity across research and industry as systems that extend large language models with additional capabilities to plan, use tools, remember, and act toward specific goals. Yet despite their promise, developers face persistent and often underexplored challenges when building, deploying, and maintaining these emerging systems. To identify these challenges, we study developer discussions on Stack Overflow, the world's largest developer-focused Q and A platform with about 60 million questions and answers and 30 million users. We construct a taxonomy of developer challenges through tag expansion and filtering, apply LDA-MALLET for topic modeling, and manually validate and label the resulting themes. Our analysis reveals seven major areas of recurring issues encompassing 77 distinct technical challenges related to runtime integration, dependency management, orchestration complexity, and evaluation reliability. We further quantify topic popularity and difficulty to identify which issues are most common and hardest to resolve, map the tools and programming languages used in agent development, and track their evolution from 2021 to 2025 in relation to major AI model and framework releases. Finally, we present the implications of our results, offering concrete guidance for practitioners, researchers, and educators on agent reliability and developer support.
Related papers
- Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey [59.3507264893654]
Issue resolution is a complex Software Engineering task integral to real-world development.<n> benchmarks like SWE-bench revealed this task as profoundly difficult for large language models.<n>This paper presents a systematic survey of this emerging domain.
arXiv Detail & Related papers (2026-01-15T18:55:03Z) - LLM-Based Agentic Systems for Software Engineering: Challenges and Opportunities [0.03437656066916039]
This concept paper systematically reviews the emerging paradigm of LLM-based multi-agent systems.<n>We delve into a wide range of topics such as language model selection, SE evaluation benchmarks, state-of-the-art agentic frameworks and communication protocols.
arXiv Detail & Related papers (2026-01-14T19:28:30Z) - Prompting in Practice: Investigating Software Developers' Use of Generative AI Tools [17.926187565860232]
The integration of generative artificial intelligence (GenAI) tools has fundamentally transformed software development.<n>This study presents a systematic investigation of how software engineers integrate GenAI tools into their professional practice.<n>We surveyed 91 software engineers, including 72 active GenAI users, to understand AI usage patterns throughout the development process.
arXiv Detail & Related papers (2025-10-07T15:02:22Z) - AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities [8.086360127362815]
Large language model (LLM)-based coding agents autonomously plan, execute, and interact with tools such as compilers, debuggers, and version control systems.<n>Unlike conventional code generation, these agents decompose goals, coordinate multi-step processes, and adapt based on feedback, reshaping software development practices.
arXiv Detail & Related papers (2025-08-15T00:14:31Z) - Deep Research Agents: A Systematic Examination And Roadmap [109.53237992384872]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z) - From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review [1.4929298667651645]
We present a comparison of benchmarks developed between 2019 and 2025 that evaluate large language models and autonomous AI agents.<n>We propose a taxonomy of approximately 60 benchmarks that cover knowledge reasoning, mathematical problem-solving, code generation and software engineering, factual grounding and retrieval, domain-specific evaluations, multimodal and embodied tasks, task orchestration, and interactive assessments.<n>We present real-world applications of autonomous AI agents in materials science, biomedical research, academic ideation, software engineering, synthetic data generation, mathematical problem-solving, geographic information systems, multimedia, healthcare, and finance.
arXiv Detail & Related papers (2025-04-28T11:08:22Z) - Developer Challenges on Large Language Models: A Study of Stack Overflow and OpenAI Developer Forum Posts [2.704899832646869]
Large Language Models (LLMs) have gained widespread popularity due to their exceptional capabilities across various domains.
This study investigates developers' challenges by analyzing community interactions on Stack Overflow and OpenAI Developer Forum.
arXiv Detail & Related papers (2024-11-16T19:38:27Z) - SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories.
Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development.
We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z) - Voices from the Frontier: A Comprehensive Analysis of the OpenAI Developer Forum [5.667013605202579]
OpenAI's advanced large language models (LLMs) have revolutionized natural language processing and enabled developers to create innovative applications.
This paper presents a comprehensive analysis of the OpenAI Developer Forum.
We focus on (1) popularity trends and user engagement patterns, and (2) a taxonomy of challenges and concerns faced by developers.
arXiv Detail & Related papers (2024-08-03T06:57:43Z) - OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI [73.75520820608232]
We introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities.<n>These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage.<n>Our evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration.
arXiv Detail & Related papers (2024-06-18T16:20:53Z) - A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond [84.95530356322621]
This survey presents a systematic review of the advancements in code intelligence.<n>It covers over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works.<n>Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence.
arXiv Detail & Related papers (2024-03-21T08:54:56Z) - On the Challenges and Opportunities in Generative AI [155.030542942979]
We argue that current large-scale generative AI models exhibit several fundamental shortcomings that hinder their widespread adoption across domains.<n>We aim to provide researchers with insights for exploring fruitful research directions, thus fostering the development of more robust and accessible generative AI solutions.
arXiv Detail & Related papers (2024-02-28T15:19:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.