Fugu-MT 論文翻訳(概要): Do Autonomous Agents Contribute Test Code? A Study of Tests in Agentic Pull Requests

論文の概要: Do Autonomous Agents Contribute Test Code? A Study of Tests in Agentic Pull Requests

arxiv url: http://arxiv.org/abs/2601.03556v1
Date: Wed, 07 Jan 2026 03:52:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 02:15:23.185163
Title: Do Autonomous Agents Contribute Test Code? A Study of Tests in Agentic Pull Requests
Title（参考訳）: 自律エージェントはテストコードに貢献するか? : エージェントプル要求におけるテストの検討
Authors: Sabrina Haque, Sarvesh Ingale, Christoph Csallner,
Abstract要約: AIDevデータセットを用いたエージェントプルリクエストにおけるテストインクルージョンに関する実証的研究を行った。テストを含むPRは時間とともに一般的になり、より大きくなり、完成までに時間がかかる傾向にある。また、テストPRにおけるテスト採用とテストコードと運用コードのバランスの両方において、エージェント間のばらつきも観察します。
参考スコア（独自算出の注目度）: 1.2043574473965317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Testing is a critical practice for ensuring software correctness and long-term maintainability. As agentic coding tools increasingly submit pull requests (PRs), it becomes essential to understand how testing appears in these agent-driven workflows. Using the AIDev dataset, we present an empirical study of test inclusion in agentic pull requests. We examine how often tests are included, when they are introduced during the PR lifecycle and how test-containing PRs differ from non-test PRs in terms of size, turnaround time, and merge outcomes. Across agents, test-containing PRs are more common over time and tend to be larger and take longer to complete, while merge rates remain largely similar. We also observe variation across agents in both test adoption and the balance between test and production code within test PRs. Our findings provide a descriptive view of testing behavior in agentic pull requests and offer empirical grounding for future studies of autonomous software development.
Abstract（参考訳）: テストはソフトウェアの正しさと長期的な保守性を保証するための重要なプラクティスです。エージェント型コーディングツールはますますプルリクエスト(PR)を提出するようになっているため、エージェント駆動のワークフローでテストがどのように現れるかを理解することが不可欠になっている。 AIDevデータセットを用いて,エージェントプルリクエストにおけるテストインクルージョンの実証的研究を行った。テストに含まれる頻度、PRライフサイクル中に導入される頻度、テストを含むPRがテスト以外のPRとサイズ、ターンアラウンド時間、マージ結果の相違について検討する。テストを含むPRは時間とともに一般的になり、より大きくなり、完成までに時間がかかる傾向にあるが、マージ率は概ね類似している。また、テストPRにおけるテスト採用とテストコードと運用コードのバランスの両方において、エージェント間のばらつきも観察します。本研究は,エージェントプル要求におけるテスト動作の記述的ビューを提供し,自律型ソフトウェア開発の今後の研究に実証的基盤を提供する。

関連論文リスト

Security in the Age of AI Teammates: An Empirical Study of Agentic Pull Requests on GitHub [4.409447722044799]
本研究の目的は,自律型コーディングエージェントが実際にソフトウェアセキュリティにどのように貢献するかを特徴付けることである。 AIDevデータセットを用いてエージェントによるPRの大規模解析を行う。次に、頻度、受け入れ結果を分析し、自律エージェント、プログラミングエコシステム、コード変更のタイプをレビューします。
論文参考訳（メタデータ） (2026-01-01T21:14:11Z)
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasisは、AIエージェントを体系的にストレステストするための軽量でモデルに依存しない方法である。 TraitBasisは、ステアブルなユーザ特性に対応するアクティベーション空間で方向を学習する。 We observed on average a 2%-30% performance degradation on $tau$-Trait across frontier model。
論文参考訳（メタデータ） (2025-10-06T05:03:57Z)
Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents [2.3429263075112288]
本稿では,静的コード解析,デザイナの尋問,文献マイニング,ペルソナ駆動の対人テスト生成を組み合わせたメタエージェントであるAgent-Testing Agent(ATA)を提案する。各対話はLLM-as-a-Judge (LAAJ)ルーブリックでスコアされ、その後の試験をエージェントの最も弱い能力に向けて操るために使用される。
論文参考訳（メタデータ） (2025-08-24T15:02:13Z)
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking [48.90371827091671]
AutoExperimentは、AIエージェントの機械学習実験の実装と実行能力を評価するベンチマークである。我々は最先端のエージェントを評価し、n$が増加するにつれて性能が急速に低下することを発見した。本研究は、長期コード生成、文脈検索、自律的な実験実行における重要な課題を浮き彫りにした。
論文参考訳（メタデータ） (2025-06-24T15:39:20Z)
Validation of massively-parallel adaptive testing using dynamic control matching [0.0]
現代のビジネスはしばしば同時に多数のA/B/nテストを実行し、多くのコンテンツバリエーションを同じメッセージにパッケージ化する。本稿では, 連続試験適応条件下での各種試験の因果効果を解消する手法を提案する。
論文参考訳（メタデータ） (2023-05-02T11:28:12Z)
Sequential Kernelized Independence Testing [77.237958592189]
我々は、カーネル化依存度にインスパイアされたシーケンシャルなカーネル化独立試験を設計する。シミュレーションデータと実データの両方にアプローチのパワーを実証する。
論文参考訳（メタデータ） (2022-12-14T18:08:42Z)
A Search-Based Testing Approach for Deep Reinforcement Learning Agents [1.1580916951856255]
本稿では、DRLエージェントのポリシーをテストするために、検索に基づく強化学習エージェント(STARLA)のテスト手法を提案する。我々は、機械学習モデルと専用の遺伝的アルゴリズムを使用して、故障エピソードに対する探索を絞り込みます。
論文参考訳（メタデータ） (2022-06-15T20:51:33Z)
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework [68.96770035057716]
A/Bテスト(A/B Testing)は、新しい製品を製薬、技術、伝統産業の古い製品と比較するビジネス戦略である。本稿では,オンライン実験においてA/Bテストを実施するための強化学習フレームワークを提案する。
論文参考訳（メタデータ） (2020-02-05T10:25:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。