Fugu-MT 論文翻訳(概要): ContextNav: Towards Agentic Multimodal In-Context Learning

論文の概要: ContextNav: Towards Agentic Multimodal In-Context Learning

arxiv url: http://arxiv.org/abs/2510.04560v1
Date: Mon, 06 Oct 2025 07:49:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.737103
Title: ContextNav: Towards Agentic Multimodal In-Context Learning
Title（参考訳）: ContextNav: エージェント型マルチモーダルインコンテキスト学習を目指して
Authors: Honghao Fu, Yuan Ouyang, Kai-Wei Chang, Yiwei Wang, Zi Huang, Yujun Cai,
Abstract要約: ContextNavは、自動検索のスケーラビリティと人間のようなキュレーションの品質と適応性を統合するエージェントフレームワークである。リソースを意識したマルチモーダル埋め込みパイプラインを構築し、検索可能なベクトルデータベースを維持し、エージェント検索と構造アライメントを適用して、ノイズ耐性のあるコンテキストを構築する。実験の結果、ContextNavはさまざまなデータセットで最先端のパフォーマンスを実現している。
参考スコア（独自算出の注目度）: 85.05420047017513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances demonstrate that multimodal large language models (MLLMs) exhibit strong multimodal in-context learning (ICL) capabilities, enabling them to adapt to novel vision-language tasks from a few contextual examples. However, existing ICL approaches face challenges in reconciling scalability with robustness across diverse tasks and noisy contextual examples: manually selecting examples produces clean contexts but is labor-intensive and task-specific, while similarity-based retrieval improves scalability but could introduce irrelevant or structurally inconsistent samples that degrade ICL performance. To address these limitations, we propose ContextNav, the first agentic framework that integrates the scalability of automated retrieval with the quality and adaptiveness of human-like curation, enabling noise-robust and dynamically optimized contextualization for multimodal ICL. ContextNav unifies context management and noise-robust contextualization within a closed-loop workflow driven by graph-based orchestration. Specifically, it builds a resource-aware multimodal embedding pipeline, maintains a retrievable vector database, and applies agentic retrieval and structural alignment to construct noise-resilient contexts. An Operational Grammar Graph (OGG) further supports adaptive workflow planning and optimization, enabling the agent to refine its operational strategies based on downstream ICL feedback. Experimental results demonstrate that ContextNav achieves state-of-the-art performance across various datasets, underscoring the promise of agentic workflows for advancing scalable and robust contextualization in multimodal ICL.
Abstract（参考訳）: 近年,マルチモーダル大規模言語モデル (MLLM) は強力なマルチモーダル・イン・コンテクスト・ラーニング (ICL) 能力を示し,いくつかの文脈から新しい視覚言語タスクに適応できることが示されている。例を手動で選択するとクリーンなコンテキストが生成されるが、労働集約的でタスク固有のものであるのに対して、類似性に基づく検索はスケーラビリティを改善するが、ICLのパフォーマンスを低下させる無関係または構造的に一貫性のないサンプルを導入することができる。これらの制約に対処するため、我々は、自動検索のスケーラビリティと人間のようなキュレーションの質と適応性を統合する最初のエージェントフレームワークContextNavを提案し、マルチモーダルICLのためのノイズロスと動的に最適化されたコンテキスト化を実現した。 ContextNavは、グラフベースのオーケストレーションによって駆動されるクローズドループワークフロー内で、コンテキスト管理とノイズロバストなコンテキスト化を統合する。具体的には、リソースを意識したマルチモーダル埋め込みパイプラインを構築し、検索可能なベクトルデータベースを維持し、エージェント検索と構造アライメントを適用して、ノイズ耐性のあるコンテキストを構築する。オペレーショナルグラマーグラフ(OGG)はさらに適応的なワークフロー計画と最適化をサポートしており、エージェントは下流のICLフィードバックに基づいて運用戦略を洗練できる。実験結果から、ContextNavはさまざまなデータセットをまたいだ最先端のパフォーマンスを実現し、マルチモーダルICLにおけるスケーラブルで堅牢なコンテキスト化を実現するためのエージェントワークフローの可能性を強調した。

論文の概要: ContextNav: Towards Agentic Multimodal In-Context Learning

関連論文リスト