Related papers: Mind the Metrics: Patterns for Telemetry-Aware In-IDE AI Application Development using the Model Context Protocol (MCP)

Related papers

PymooLab: An Open-Source Visual Analytics Framework for Multi-Objective Optimization using LLM-Based Code Generation and MCDM [0.0]
PymooLab is an open-source visual analytics environment built on top of textitpymoo.<n>It unifies configuration, execution monitoring, and formal decision support in a single reproducible workflow.<n>For computationally intensive studies, PymooLab relies on the native textitpymoo acceleration pathway through JAX.
arXiv Detail & Related papers (2026-03-02T00:56:32Z)
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development [72.4729759618632]
We introduce ABC-Bench, a benchmark to evaluate agentic backend coding within a realistic, executable workflow.<n>We curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.<n>Our evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks.
arXiv Detail & Related papers (2026-01-16T08:23:52Z)
SmartMLOps Studio: Design of an LLM-Integrated IDE with Automated MLOps Pipelines for Model Development and Monitoring [0.3823356975862005]
Traditional Integrated Development Environments (IDEs) focus primarily on code, lacking intelligent support for the full machine learning lifecycle.<n>This study proposes the design of an LLM- Integrated IDE with automated MLOps pipelines that enables continuous model development and monitoring within a single environment.<n>The backend incorporates automated data validation, feature storage, drift detection, retraining triggers, and CI/CD deployment orchestration.<n> Experimental results demonstrate that SmartMLOps Studio reduces pipeline configuration time by 61%, improves experiment by 45%, and increases drift detection accuracy by 14% compared to traditional.
arXiv Detail & Related papers (2025-11-03T18:56:59Z)
A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System [56.40989626804489]
This survey provides the first holistic analysis of Large Language Models-powered software engineering.<n>We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair.
arXiv Detail & Related papers (2025-10-10T06:56:50Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers [86.00932417210477]
We introduce MCP-Universe, the first comprehensive benchmark specifically designed to evaluate LLMs in realistic and hard tasks through interaction with real-world MCP servers.<n>Our benchmark encompasses 6 core domains spanning 11 different MCP servers: Location Navigation, Repository Management, Financial Analysis, 3D Design, Browser Automation, and Web Searching.<n>We find that even SOTA models such as GPT-5 (43.72%), Grok-4 (33.33%) and Claude-4.0-Sonnet (29.44%) exhibit significant performance limitations.
arXiv Detail & Related papers (2025-08-20T13:28:58Z)
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use [101.57043903478257]
The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations.<n>With the evolution of (multi-modal) large language models ((M)LLMs), this dream is closer to reality.<n>This survey aims to consolidate the state of OS Agents research, providing insights to guide both academic inquiry and industrial development.
arXiv Detail & Related papers (2025-08-06T14:33:45Z)
Deep Research Agents: A Systematic Examination And Roadmap [79.04813794804377]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z)
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z)
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications [57.38526350775472]
This tutorial provides a systematic introduction to the principles, design, and applications of Large Artificial Intelligence Models (LAMs) and Agentic AI technologies.<n>We outline the background of 6G communications, review the technological evolution from LAMs to Agentic AI, and clarify the tutorial's motivation and main contributions.
arXiv Detail & Related papers (2025-05-28T12:54:07Z)
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI [0.36868085124383626]
Review presents a comprehensive analysis of two emerging paradigms in AI-assisted software development: vibe coding and agentic coding.<n> Vibe coding emphasizes intuitive, human-in-the-loop interaction through prompt-based, conversational interaction.<n>Agentic coding enables autonomous software development through goal-driven agents capable of planning, executing, testing, and iterating tasks with minimal human intervention.
arXiv Detail & Related papers (2025-05-26T03:00:21Z)
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Multi-Agent Systems Powered by Large Language Models: Applications in Swarm Intelligence [0.8602553195689513]
This work examines the integration of large language models (LLMs) into multi-agent simulations by replacing the hard-coded programs of agents with LLM-driven prompts.<n>The proposed approach is showcased in the context of two examples of complex systems from the field of swarm intelligence: ant colony foraging and bird flocking.
arXiv Detail & Related papers (2025-03-05T17:13:27Z)
LLM-Generated Microservice Implementations from RESTful API Definitions [3.740584607001637]
This paper presents a system that uses Large Language Models (LLMs) to automate the API-first development of software.<n>The system generates OpenAPI specification, generating server code from it, and refining the code through a feedback loop that analyzes execution logs and error messages.<n>The system has the potential to benefit software developers, architects, and organizations to speed up software development cycles.
arXiv Detail & Related papers (2025-02-13T20:50:33Z)
Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System [13.65717444483291]
ToP (Think-on-Process) is a dynamic process generation framework for software development. Our framework significantly enhances the dynamic process generation capability of the GPT-3.5 and GPT-4.
arXiv Detail & Related papers (2024-09-10T15:02:34Z)
Towards Single-System Illusion in Software-Defined Vehicles -- Automated, AI-Powered Workflow [3.2821049498759094]
We propose a novel model- and feature-based approach to development of vehicle software systems. One of the key points of the presented approach is the inclusion of modern generative AI, specifically Large Language Models (LLMs) The resulting pipeline is automated to a large extent, with feedback being generated at each step.
arXiv Detail & Related papers (2024-03-21T15:07:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.