Fugu-MT 論文翻訳(概要): RAG-Anything: All-in-One RAG Framework

論文の概要: RAG-Anything: All-in-One RAG Framework

arxiv url: http://arxiv.org/abs/2510.12323v1
Date: Tue, 14 Oct 2025 09:25:35 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 19:02:32.268395
Title: RAG-Anything: All-in-One RAG Framework
Title（参考訳）: RAG-Anything:オールインワンのRAGフレームワーク
Authors: Zirui Guo, Xubin Ren, Lingrui Xu, Jiahao Zhang, Chao Huang,
Abstract要約: RAG-Anythingは,すべてのモダリティにまたがる包括的知識検索を可能にする統一的なフレームワークである。本手法は, 孤立データ型ではなく, 相互接続された知識エンティティとして, マルチモーダルコンテンツを再認識する。
参考スコア（独自算出の注目度）: 10.858282833070726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm for expanding Large Language Models beyond their static training limitations. However, a critical misalignment exists between current RAG capabilities and real-world information environments. Modern knowledge repositories are inherently multimodal, containing rich combinations of textual content, visual elements, structured tables, and mathematical expressions. Yet existing RAG frameworks are limited to textual content, creating fundamental gaps when processing multimodal documents. We present RAG-Anything, a unified framework that enables comprehensive knowledge retrieval across all modalities. Our approach reconceptualizes multimodal content as interconnected knowledge entities rather than isolated data types. The framework introduces dual-graph construction to capture both cross-modal relationships and textual semantics within a unified representation. We develop cross-modal hybrid retrieval that combines structural knowledge navigation with semantic matching. This enables effective reasoning over heterogeneous content where relevant evidence spans multiple modalities. RAG-Anything demonstrates superior performance on challenging multimodal benchmarks, achieving significant improvements over state-of-the-art methods. Performance gains become particularly pronounced on long documents where traditional approaches fail. Our framework establishes a new paradigm for multimodal knowledge access, eliminating the architectural fragmentation that constrains current systems. Our framework is open-sourced at: https://github.com/HKUDS/RAG-Anything.
Abstract（参考訳）: Retrieval-Augmented Generation (RAG)は、大規模言語モデルを静的トレーニングの制限を超えて拡張するための基本パラダイムとして登場した。しかしながら、現在のRAG機能と現実世界の情報環境の間には、重大な不一致が存在する。現代の知識リポジトリは本質的にマルチモーダルであり、テキストの内容、視覚要素、構造化テーブル、数学的表現の豊富な組み合わせを含んでいる。しかし、既存のRAGフレームワークはテキストコンテンツに限られており、マルチモーダル文書を処理する際に基本的なギャップが生じる。 RAG-Anythingは,すべてのモダリティにまたがる包括的知識検索を可能にする統一的なフレームワークである。本手法は, 孤立データ型ではなく, 相互接続された知識エンティティとして, マルチモーダルコンテンツを再認識する。このフレームワークは、クロスモーダルな関係と、統一された表現内でのテキストの意味の両方をキャプチャするデュアルグラフ構造を導入している。構造知識ナビゲーションとセマンティックマッチングを組み合わせたクロスモーダルハイブリッド検索を開発した。これにより、関連する証拠が複数のモダリティにまたがる異種コンテンツに対する効果的な推論が可能になる。 RAG-Anythingは、挑戦的なマルチモーダルベンチマークにおいて優れた性能を示し、最先端の手法よりも大幅に改善されている。従来のアプローチが失敗する長いドキュメントでは、パフォーマンスの向上が特に顕著になる。我々のフレームワークは、現在のシステムを制約するアーキテクチャの断片化を排除し、マルチモーダルな知識アクセスのための新しいパラダイムを確立します。私たちのフレームワークは、https://github.com/HKUDS/RAG-Anything.comでオープンソース化されています。

論文の概要: RAG-Anything: All-in-One RAG Framework

関連論文リスト