Fugu-MT 論文翻訳(概要): LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment

論文の概要: LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment

arxiv url: http://arxiv.org/abs/2511.02371v1
Date: Tue, 04 Nov 2025 08:47:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 18:47:05.858979
Title: LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment
Title（参考訳）: LUMA-RAG:安定なストリーミングアライメントを有する長寿命マルチモーダルエージェント
Authors: Rohan Wandre, Yash Gajewar, Namrata Patel, Vivek Dhalkari,
Abstract要約: Retrieval-Augmented Generationは、検証可能な証拠で大規模言語モデルのアウトプットを基礎づける主要なパラダイムとして登場した。 LUMA-RAGは,3つの重要なイノベーションを特徴とする,生涯にわたるマルチモーダルエージェントアーキテクチャである。実験では、堅牢なテキスト・ツー・イメージ検索(Recall@10 = 0.94)、製品量子化オフロードによる優雅なパフォーマンス劣化、安定したオーディオ・ツー・イメージランキング(Safe@1 = 1.0)が示されている。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-Augmented Generation (RAG) has emerged as the dominant paradigm for grounding large language model outputs in verifiable evidence. However, as modern AI agents transition from static knowledge bases to continuous multimodal streams encompassing text, images, video, and audio, two critical challenges arise: maintaining index freshness without prohibitive re-indexing costs, and preserving cross-modal semantic consistency across heterogeneous embedding spaces. We present LUMA-RAG, a lifelong multimodal agent architecture featuring three key innovations: (i) a streaming, multi-tier memory system that dynamically spills embeddings from a hot HNSW tier to a compressed IVFPQ tier under strict memory budgets; (ii) a streaming CLAP->CLIP alignment bridge that maintains cross-modal consistency through incremental orthogonal Procrustes updates; and (iii) stability-aware retrieval telemetry providing Safe@k guarantees by jointly bounding alignment drift and quantization error. Experiments demonstrate robust text-to-image retrieval (Recall@10 = 0.94), graceful performance degradation under product quantization offloading, and provably stable audio-to-image rankings (Safe@1 = 1.0), establishing LUMA-RAG as a practical framework for production multimodal RAG systems.
Abstract（参考訳）: Retrieval-Augmented Generation (RAG) は、検証可能な証拠で大規模言語モデルの出力を基礎づける主要なパラダイムとして登場した。しかし、現代のAIエージェントが静的な知識ベースから、テキスト、画像、ビデオ、オーディオを含む継続的マルチモーダルストリームへと移行するにつれて、2つの重要な課題が発生する。 LUMA-RAGは,3つの重要なイノベーションを特徴とする,生涯にわたるマルチモーダルエージェントアーキテクチャである。 i) 厳格なメモリ予算の下で、熱いHNSW層から圧縮されたIVFPQ層に埋め込みを動的にこぼすストリーミング多層メモリシステム。 (II)増分直交Procrustes更新による相互整合性を維持するストリーミングCLAP->CLIPアライメントブリッジ 3)アライメントドリフトと量子化誤差の連接によるSafe@k保証を実現する安定性を考慮した検索テレメトリ。実験では、堅牢なテキスト画像検索(Recall@10 = 0.94)、製品量子化オフロードによる優雅なパフォーマンス劣化、安定したオーディオ画像ランキング(Safe@1 = 1.0)を実証し、LUMA-RAGをマルチモーダルRAGシステムの実用的フレームワークとして確立した。

論文の概要: LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment

関連論文リスト