Related papers: Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering

Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering

URL: http://arxiv.org/abs/2602.01465v2
Date: Sat, 07 Feb 2026 20:47:07 GMT
Title: Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering
Authors: Nikita Benkovich, Vitalii Valkov,
Abstract summary: Real-world software development is organized as a collaborative activity carried out by teams following shared methodologies.<n>We present a fully automated multi-agent system that explicitly models software engineering as an organizational process.<n>Our results suggest that replicating team structure, methodology, and communication is a powerful paradigm for autonomous software engineering.
Score: 0.09046463333989574
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models have demonstrated strong capabilities in individual software engineering tasks, yet most autonomous systems still treat issue resolution as a monolithic or pipeline-based process. In contrast, real-world software development is organized as a collaborative activity carried out by teams following shared methodologies, with clear role separation, communication, and review. In this work, we present a fully automated multi-agent system that explicitly models software engineering as an organizational process, replicating the structure of an engineering team. Built on top of agyn, an open-source platform for configuring agent teams, our system assigns specialized agents to roles such as coordination, research, implementation, and review, provides them with isolated sandboxes for experimentation, and enables structured communication. The system follows a defined development methodology for working on issues, including analysis, task specification, pull request creation, and iterative review, and operates without any human intervention. Importantly, the system was designed for real production use and was not tuned for SWE-bench. When evaluated post hoc on SWE-bench 500, it resolves 72.2% of tasks, outperforming single-agent baselines using comparable language models. Our results suggest that replicating team structure, methodology, and communication is a powerful paradigm for autonomous software engineering, and that future progress may depend as much on organizational design and agent infrastructure as on model improvements.

Related papers

Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents [14.448267395835721]
We propose a unified taxonomy that breaks agents into Perception, Brain, Planning, Action, Tool Use, and Collaboration.<n>We also group the environments in which these agents operate, including digital operating systems, embodied robotics, and other specialized domains.
arXiv Detail & Related papers (2026-01-18T19:51:16Z)
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey [59.3507264893654]
Issue resolution is a complex Software Engineering task integral to real-world development.<n> benchmarks like SWE-bench revealed this task as profoundly difficult for large language models.<n>This paper presents a systematic survey of this emerging domain.
arXiv Detail & Related papers (2026-01-15T18:55:03Z)
An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z)
Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism [61.01709143437043]
We introduce a novel agent design framework centered on a Hierarchical Task Abstraction Mechanism (HTAM)<n>Specifically, HTAM moves beyond emulating social roles, instead structuring multi-agent systems into a logical hierarchy that mirrors the intrinsic task-dependency graph of a given domain.<n>We instantiate this framework as EarthAgent, a multi-agent system tailored for complex geospatial analysis.
arXiv Detail & Related papers (2025-11-21T12:25:47Z)
LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering [90.84806758077536]
We introduce textbfLoCoBench-Agent, a comprehensive evaluation framework specifically designed to assess large language models (LLMs) agents in realistic, long-context software engineering.<n>Our framework extends LoCoBench's 8,000 scenarios into interactive agent environments, enabling systematic evaluation of multi-turn conversations.<n>Our framework provides agents with 8 specialized tools (file operations, search, code analysis) and evaluates them across context lengths ranging from 10K to 1M tokens.
arXiv Detail & Related papers (2025-11-17T23:57:24Z)
Engineering.ai: A Platform for Teams of AI Engineers in Computational Design [8.217119500224284]
We present Engineering.ai, a platform for teams of AI engineers in computational design.<n>The framework employs a hierarchical multi-agent architecture where a Chief Engineer coordinates specialized agents.<n>The system integrates FreeCAD, Gmsh, OpenFOAM, CalculiX, and BPM acoustic analysis, enabling parallel multidisciplinary simulations.
arXiv Detail & Related papers (2025-10-31T08:00:48Z)
Agentic Systems in Radiology: Design, Applications, Evaluation, and Challenges [13.53016942028838]
Large language models (LLMs) are capable of using natural language to integrate information, follow instructions, and perform forms of "reasoning" and planning.<n>With its multimodal data streams and orchestrated spanning multiple systems, radiology is uniquely suited to benefit from agents that can adapt to context and automate repetitive yet complex tasks.<n>This review examines the design of such LLM agentic systems, highlights key applications, discusses evaluation methods for planning and tool use, and outlines challenges such as error cascades, tool-use efficiency, and health IT integration.
arXiv Detail & Related papers (2025-10-10T13:56:27Z)
A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System [56.40989626804489]
This survey provides the first holistic analysis of Large Language Models-powered software engineering.<n>We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair.
arXiv Detail & Related papers (2025-10-10T06:56:50Z)
JoyAgent-JDGenie: Technical Report on the GAIA [27.025464023889853]
Large Language Models are increasingly deployed as autonomous agents for complex real-world tasks.<n>We propose a generalist agent architecture that integrates planning and execution agents with critic model voting, a hierarchical memory system spanning working, semantic, and procedural layers, and a refined tool suite for search, code execution, and multimodal parsing.
arXiv Detail & Related papers (2025-10-01T04:41:58Z)
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI [0.36868085124383626]
Review presents a comprehensive analysis of two emerging paradigms in AI-assisted software development: vibe coding and agentic coding.<n> Vibe coding emphasizes intuitive, human-in-the-loop interaction through prompt-based, conversational interaction.<n>Agentic coding enables autonomous software development through goal-driven agents capable of planning, executing, testing, and iterating tasks with minimal human intervention.
arXiv Detail & Related papers (2025-05-26T03:00:21Z)
An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework [49.633199780510864]
This work proposes a multi-agent autonomous mechatronics design framework, integrating expertise across mechanical design, optimization, electronics, and software engineering.<n> operating primarily through a language-driven workflow, the framework incorporates structured human feedback to ensure robust performance under real-world constraints.<n>A fully functional autonomous vessel was developed with optimized propulsion, cost-effective electronics, and advanced control.
arXiv Detail & Related papers (2025-04-20T16:57:45Z)
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.