Fugu-MT 論文翻訳(概要): Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting

論文の概要: Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting

arxiv url: http://arxiv.org/abs/2605.20684v1
Date: Wed, 20 May 2026 04:23:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.474037
Title: Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting
Title（参考訳）: セマンティック類似性を超えた2相非パラメトリック検索ワークフロー
Authors: Linus Ng Junjia, Ezekiel Tee Kongquan, Kelvin Heng, Kenneth Zhu Ke, Zhao Jing Yuan,
Abstract要約: 社債の引受には、アナリストが長く異質な財務文書から実行可能な証拠を抽出する必要がある。本稿では,2段階の非パラメトリック検索アーキテクチャを提案し,高精度ユーティリティランキングからハイリコール候補検索を分離する。 800人以上のクレジットアナリストのプロダクションデプロイメントでは、文書レビューの時間は数時間から約3分に短縮された。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Corporate credit underwriting requires analysts to extract actionable evidence from long, heterogeneous financial documents spanning hundreds of pages and multiple languages. Standard Retrieval-Augmented Generation (RAG) pipelines optimize for semantic similarity, which frequently surfaces passages that are topically related but lack decision utility, a problem we term the similarity-utility gap. We propose a two-phase non-parametric retrieval architecture that separates high-recall candidate retrieval from high-precision utility ranking. The first phase combines lexical and dense multilingual retrieval to construct a broad candidate pool. The second phase applies an adaptive retrieval controller that filters candidates using query intent and document structure signals, followed by an LLM-as-a-Judge utility scoring mechanism that ranks passages by analytical usefulness rather than semantic proximity. A context-aware extraction module preserves structural fidelity across narrative text and complex financial tables. The system is deployed entirely on-premise to satisfy enterprise data governance requirements. Evaluated on a multilingual corpus of proprietary financial documents with analyst-curated relevance labels, the system significantly outperforms naive retrieval baselines. In production deployment across more than 800 credit analysts, document review time was reduced from several hours to approximately three minutes, demonstrating the practical value of utility-aware RAG architectures for document-intensive decision-support workflows.
Abstract（参考訳）: 企業クレジットの引受には、何百ページものページと複数の言語にまたがる長い異質な財務文書から、アナリストが実行可能な証拠を抽出する必要がある。標準的な検索型Augmented Generation (RAG) パイプラインは意味的類似性を最適化する。本稿では,2段階の非パラメトリック検索アーキテクチャを提案し,高精度ユーティリティランキングからハイリコール候補検索を分離する。第1フェーズは、語彙と密集多言語検索を組み合わせて、幅広い候補プールを構築する。第2のフェーズでは、クエリインテントとドキュメント構造信号を用いて候補をフィルタリングする適応型検索コントローラを適用し、続いて、意味的近接性よりも分析的有用性によってパスをランク付けするLLM-as-a-Judgeユーティリティスコアリング機構を採用。文脈対応抽出モジュールは、物語テキストと複雑な財務表にまたがる構造的忠実性を保存する。システムは、企業データガバナンスの要件を満たすために、完全にオンプレミスにデプロイされます。アナリストが作成したレバレンスラベルを用いた多言語財務文書のコーパスに基づいて評価し、本システムは単純検索ベースラインを著しく上回っている。 800以上のクレジットアナリストのプロダクションデプロイメントでは、ドキュメントレビュー時間が数時間から約3分に短縮され、ドキュメント集約型意思決定ワークフローのためのユーティリティ対応RAGアーキテクチャの実用的価値が実証された。

論文の概要: Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting

関連論文リスト