Fugu-MT 論文翻訳(概要): Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework

論文の概要: Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework

arxiv url: http://arxiv.org/abs/2508.18929v1
Date: Tue, 26 Aug 2025 11:16:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-27 17:42:38.814042
Title: Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework
Title（参考訳）: RAG評価のための分散およびプライベートな合成データセット生成:マルチエージェントフレームワーク
Authors: Ilias Driouich, Hongliu Cao, Eoin Thomas,
Abstract要約: Retrieval-augmented Generation (RAG) システムは、外部知識を組み込むことで、より大きな言語モデルの出力を改善する。本研究では,RAG評価のための合成QAデータセットを生成するための新しいマルチエージェントフレームワークを導入し,セマンティック多様性とプライバシ保護を優先する。
参考スコア（独自算出の注目度）: 2.102846336724103
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented generation (RAG) systems improve large language model outputs by incorporating external knowledge, enabling more informed and context-aware responses. However, the effectiveness and trustworthiness of these systems critically depends on how they are evaluated, particularly on whether the evaluation process captures real-world constraints like protecting sensitive information. While current evaluation efforts for RAG systems have primarily focused on the development of performance metrics, far less attention has been given to the design and quality of the underlying evaluation datasets, despite their pivotal role in enabling meaningful, reliable assessments. In this work, we introduce a novel multi-agent framework for generating synthetic QA datasets for RAG evaluation that prioritize semantic diversity and privacy preservation. Our approach involves: (1) a Diversity agent leveraging clustering techniques to maximize topical coverage and semantic variability, (2) a Privacy Agent that detects and mask sensitive information across multiple domains and (3) a QA curation agent that synthesizes private and diverse QA pairs suitable as ground truth for RAG evaluation. Extensive experiments demonstrate that our evaluation sets outperform baseline methods in diversity and achieve robust privacy masking on domain-specific datasets. This work offers a practical and ethically aligned pathway toward safer, more comprehensive RAG system evaluation, laying the foundation for future enhancements aligned with evolving AI regulations and compliance standards.
Abstract（参考訳）: Retrieval-augmented Generation (RAG) システムは、外部知識を組み込むことで、より大きな言語モデルの出力を改善する。しかしながら、これらのシステムの有効性と信頼性は、評価方法、特に評価プロセスが機密情報の保護などの現実的な制約を捉えているかどうかに大きく依存する。 RAGシステムに対する現在の評価努力は、主にパフォーマンス指標の開発に重点を置いているが、有意義で信頼性の高い評価を実現する上で重要な役割を担っているにもかかわらず、基礎となる評価データセットの設計と品質には、はるかに注意が向けられている。本研究では,RAG評価のための合成QAデータセットを生成するための新しいマルチエージェントフレームワークを紹介し,セマンティック多様性とプライバシ保護を優先する。提案手法は,(1)クラスタリング技術を活用し,トピックのカバレッジとセマンティックな多様性を最大化するための多様性エージェント,(2)複数のドメインにまたがる機密情報を検出・マスクするプライバシエージェント,(3)RAG評価に好適な個人的および多様なQAペアを合成するQAキュレーションエージェントである。大規模な実験により、我々の評価は多様性の基準となる手法よりも優れており、ドメイン固有のデータセット上で堅牢なプライバシマスマスキングを実現することを実証した。この作業は、より安全で包括的なRAGシステム評価への実践的かつ倫理的に整合した経路を提供し、進化するAI規制やコンプライアンス標準に沿う将来の強化の基盤を築き上げます。

論文の概要: Diverse And Private Synthetic Datasets Generation for RAG evaluation: A multi-agent framework

関連論文リスト