Fugu-MT 論文翻訳(概要): A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

論文の概要: A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

arxiv url: http://arxiv.org/abs/2604.14403v1
Date: Wed, 15 Apr 2026 20:34:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-17 21:29:31.602231
Title: A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation
Title（参考訳）: オンデバイス検索拡張生成のための統一モデルと文書表現
Authors: Julian Killingback, Ofer Meshi, Henry Li, Hamed Zamani, Maryam Karimzadehgan,
Abstract要約: 我々は、RAGコンテキストを圧縮し、同じ表現を検索に利用する統一モデルを提案する。平均1/10のコンテキストで、我々のモデルはストレージ要求を増大させることなく従来のRAGリーダのパフォーマンスと一致します。
参考スコア（独自算出の注目度）: 31.59984397397994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional Retrieval-Augmented Generation (RAG) approaches generally assume that retrieval and generation occur on powerful servers removed from the end user. While this reduces local hardware constraints, it introduces significant drawbacks: privacy concerns regarding data access, recurring maintenance and storage costs, increased latency, and the necessity of an internet connection. On-device RAG addresses these challenges by executing the entire pipeline locally, making it ideal for querying sensitive personal information such as financial documents, contact details, and medical history. However, on-device deployment necessitates a delicate balance between limited memory and disk space. Specifically, the context size provided to the generative model must be restricted to manage KV cache and attention memory usage, while the size of stored embeddings must be minimized to preserve disk space. In this work, we propose a unified model that compresses the RAG context and utilizes the same representations for retrieval. This approach minimizes disk utilization compared to using separate representations, while significantly reducing the context size required for generation. With an average of 1/10 of the context, our model matches the performance of a traditional RAG reader without increasing storage requirements compared to a multi-vector retrieval model. This approach represents the first model to unify retrieval and context compression using a shared model and representation. We believe this work will inspire further consolidation of distinct models to optimize on-device performance.
Abstract（参考訳）: 従来のRAG(Retrieval-Augmented Generation)アプローチは一般的に、エンドユーザから削除された強力なサーバに対して、検索と生成が発生すると仮定する。これにより、ローカルハードウェアの制約が軽減されるが、データアクセスに関するプライバシー上の懸念、メンテナンスとストレージコストの繰り返し、レイテンシの増加、インターネット接続の必要性など、大きな欠点が生じる。オンデバイスRAGは、パイプライン全体をローカルに実行することでこれらの課題に対処する。しかし、オンデバイスデプロイメントは、限られたメモリとディスクスペースの微妙なバランスを必要とする。具体的には、生成モデルに提供されるコンテキストサイズは、KVキャッシュと注意メモリ使用量を管理するために制限されなければならないが、格納された埋め込みのサイズはディスク空間を保存するために最小化されなければならない。本研究では,RAGコンテキストを圧縮し,同じ表現を検索に用いる統一モデルを提案する。このアプローチは、別個の表現を使用する場合に比べてディスク利用を最小化すると同時に、生成に必要なコンテキストサイズを大幅に削減する。平均1/10の文脈で,本モデルは,マルチベクトル検索モデルと比較してストレージ要求を増大させることなく,従来のRAGリーダの性能と一致させる。このアプローチは、共有モデルと表現を用いた検索とコンテキスト圧縮を統一する最初のモデルである。この作業によって、デバイス上でのパフォーマンスを最適化するために、異なるモデルのさらなる統合がもたらされると思います。

論文の概要: A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

関連論文リスト