Fugu-MT 論文翻訳(概要): Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

論文の概要: Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

arxiv url: http://arxiv.org/abs/2606.14885v1
Date: Fri, 12 Jun 2026 18:46:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:32.440382
Title: Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion
Title（参考訳）: Dr-DCI:動的ワークスペース拡張による直接コーパスインタラクションのスケーリング
Authors: Yi Lu, Zhuofeng Li, Ping Nie, Haoxiang Zhang, Yuyu Zhang, Kai Zou, Wenhu Chen, Jimmy Lin, Dongfu Jiang, Yu Zhang,
Abstract要約: DR-DCIは、検索をローカルなワークスペースを拡張するためのエージェントコール可能なアクションとして扱う、レトリバーによるDCIフレームワークである。 DR-DCIはスケールにわたって効果的かつ効率的であることを示す。コーパススケーリング実験では、DR-DCIは1万文書から10万文書まで有効であり、生のDCIは不安定になり、BM25は著しく悪化する。
参考スコア（独自算出の注目度）: 76.02214947070392
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic search over large corpora relies on retriever-mediated interfaces (e.g., BM25 or ColBERT) for scalable candidate discovery. While effective at ranking relevant documents, these interfaces expose evidence only as ranked results or bounded document views, limiting agents' ability to reorganize material and verify constraints across documents. Direct Corpus Interaction (DCI) addresses this limitation by exposing shell-executable corpus operations for flexible search, filtering, comparison, and verification. However, full-corpus terminal commands become slow and unstable as the corpus grows, degrading performance and efficiency. We introduce DR-DCI, a retriever-steered DCI framework that treats retrieval as an agent-callable action for expanding a local workspace. Rather than operating directly over the full corpus, the agent dynamically pulls relevant documents into an evolving workspace and conducts DCI operations within it. This design combines retriever-level recall with DCI-style precision: retrieval keeps exploration scalable, while DCI preserves the local operations needed for effective evidence resolution. Experiments show that DR-DCI is both effective and efficient across scales. On Browsecomp-Plus, DR-DCI reaches 71.2\% accuracy, improving over raw DCI and ablated variants by up to 8.3 points while reducing tool usage, wall time, and estimated cost. With workspace-preserving context reset, accuracy further improves to 73.3\%. In corpus-scaling experiments, DR-DCI remains effective from 100K to 10M documents, whereas raw DCI becomes unstable and BM25 performs substantially worse. DR-DCI also scales to a 20M-scale file-per-document Wiki-18 QA setting, achieving an average score of 63.0 across six benchmarks and outperforming retrieval-based and trained search-agent baselines. Ablation analysis further shows that ranked previews and inter-document DCI are key to performance.
Abstract（参考訳）: 大規模コーパス上のエージェントサーチは、スケーラブルな候補発見のためにレトリバーによるインタフェース(BM25やColBERTなど)に依存している。関連文書のランク付けには有効であるが、これらのインターフェースは、資料を再編成し、文書間の制約を検証するエージェントの能力を制限する、ランク付けされた結果または境界文書ビューとしてのみエビデンスを公開する。直接コーパスインタラクション(DCI)は、フレキシブル検索、フィルタリング、比較、検証のためにシェル実行可能なコーパス操作を公開することで、この制限に対処する。しかし、コーパスが大きくなるにつれて、フルコーパス端末コマンドは遅く不安定になり、性能と効率が低下する。本稿では,検索をエージェントコール可能なアクションとして扱うDCIフレームワークDR-DCIを紹介する。エージェントは、全コーパスを直接操作する代わりに、関連するドキュメントを動的に進化するワークスペースに引き込み、内部でDCI操作を実行する。この設計は、レトリバーレベルのリコールとDCIスタイルの精度を組み合わせる。実験により、DR-DCIはスケールにわたって効果的かつ効率的であることが示されている。 Browsecomp-Plusでは、DR-DCIは71.2\%の精度に達し、生のDCIよりも改善され、ツールの使用量、壁時間、推定コストを最大8.3ポイント削減した。ワークスペース保存コンテキストリセットにより、精度はさらに73.3\%向上する。コーパススケーリング実験では、DR-DCIは1万文書から10万文書まで有効であり、生のDCIは不安定になり、BM25は著しく悪化する。 DR-DCIはまた、ドキュメント毎の20MスケールのWiki-18 QA設定までスケールし、6つのベンチマークで平均スコアが63.0に達し、検索ベースとトレーニングされた検索エージェントベースラインを上回っている。アブレーション分析により、ランク付けされたプレビューとドキュメント間DCIがパフォーマンスの鍵であることが示される。

論文の概要: Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

関連論文リスト