Fugu-MT 論文翻訳(概要): Know Or Not: a library for evaluating out-of-knowledge base robustness

論文の概要: Know Or Not: a library for evaluating out-of-knowledge base robustness

arxiv url: http://arxiv.org/abs/2505.13545v1
Date: Mon, 19 May 2025 03:17:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-21 14:49:52.387533
Title: Know Or Not: a library for evaluating out-of-knowledge base robustness
Title（参考訳）: Know or Not: 知識外ベースロバスト性を評価するライブラリ
Authors: Jessica Foo, Pradyumna Shyama Prasad, Shaun Khoo,
Abstract要約: 大規模言語モデル(LLM)のOOKB(out-of-knowledge base)ロバスト性を体系的に評価するための新しい手法を提案する。我々は,オープンソースライブラリである knowornot に方法論を実装し,ユーザがOOKB の堅牢性のために独自の評価データとパイプラインを開発できるようにする。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While the capabilities of large language models (LLMs) have progressed significantly, their use in high-stakes applications have been limited due to risks of hallucination. One key approach in reducing hallucination is retrieval-augmented generation (RAG), but even in such setups, LLMs may still hallucinate when presented with questions outside of the knowledge base. Such behavior is unacceptable in high-stake applications where LLMs are expected to abstain from answering queries it does not have sufficient context on. In this work, we present a novel methodology for systematically evaluating out-of-knowledge base (OOKB) robustness of LLMs (whether LLMs know or do not know) in the RAG setting, without the need for manual annotation of gold standard answers. We implement our methodology in knowornot, an open-source library that enables users to develop their own customized evaluation data and pipelines for OOKB robustness. knowornot comprises four main features. Firstly, it provides a unified, high-level API that streamlines the process of setting up and running robustness benchmarks. Secondly, its modular architecture emphasizes extensibility and flexibility, allowing users to easily integrate their own LLM clients and RAG settings. Thirdly, its rigorous data modeling design ensures experiment reproducibility, reliability and traceability. Lastly, it implements a comprehensive suite of tools for users to customize their pipelines. We demonstrate the utility of knowornot by developing a challenging benchmark, PolicyBench, which spans four Question-Answer (QA) chatbots on government policies, and analyze its OOKB robustness. The source code of knowornot is available https://github.com/govtech-responsibleai/KnowOrNot.
Abstract（参考訳）: 大規模言語モデル(LLM)の能力は著しく進歩してきたが、幻覚のリスクのため、高い評価のアプリケーションでの使用は制限されている。幻覚を減少させる1つの重要なアプローチは、検索増強生成(RAG)であるが、そのような設定であっても、LLMは知識ベース外の質問を提示しても幻覚を生じさせる可能性がある。このような振舞いは、LLMが十分なコンテキストを持っていないクエリの応答を控えることを期待されている高精細なアプリケーションでは受け入れられない。本研究では,ROG設定におけるLLMの非知識ベース(OOKB)ロバスト性を,ゴールド標準回答の手動アノテーションを必要とせず,体系的に評価する手法を提案する。我々は、ユーザがOOKBの堅牢性のために独自の評価データとパイプラインを開発できるオープンソースライブラリである knowornot に、我々の方法論を実装した。 knowornotは4つの主要な特徴から構成されます。まず、堅牢性ベンチマークの設定と実行のプロセスを合理化する、統一された高レベルAPIを提供する。第2に、モジュールアーキテクチャは拡張性と柔軟性を重視しており、ユーザが自身のLLMクライアントとRAG設定を簡単に統合できる。第3に、厳密なデータモデリング設計により、実験再現性、信頼性、トレーサビリティが保証される。最後に、ユーザーがパイプラインをカスタマイズするための包括的なツールスイートを実装している。政府政策に関する4つの質問応答(QA)チャットボットにまたがる、挑戦的なベンチマークであるPocialBenchを開発し、そのOOKBロバスト性を分析することで、ノウノットの有用性を実証する。 knowornotのソースコードはhttps://github.com/govtech-responsibleai/KnowOrNot.comで入手できる。

論文の概要: Know Or Not: a library for evaluating out-of-knowledge base robustness

関連論文リスト