Fugu-MT 論文翻訳(概要): PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

論文の概要: PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

arxiv url: http://arxiv.org/abs/2605.27887v2
Date: Thu, 04 Jun 2026 13:29:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-06 06:55:34.576964
Title: PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management
Title（参考訳）: PortBench: LLM駆動ポートフォリオ管理のための相関性のある完全なパイプラインベンチマーク
Authors: Yuxuan Zhao, Sijia Chen, Ningxin Su,
Abstract要約: 大規模言語モデル(LLM)は、様々な財務タスクにおいて強力なパフォーマンスを示しているが、ポートフォリオ管理(PM)はいまだにベンチマークが不十分である。 PortBenchは、10年間で6つの異種資産クラスにまたがるベンチマークです。静的な財務QAの性能は高いが、モデルに目立った組み合わせの90%は、基本的等重量割当を上回りません。
参考スコア（独自算出の注目度）: 15.684384084836223
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have shown strong performance across diverse financial tasks, yet portfolio management (PM), a critical financial decision-making task, remains poorly benchmarked. Existing benchmarks exhibit two main gaps: they ignore cross-asset correlation structures, thereby failing to distinguish genuinely diversified portfolios from concentrated ones, and fail to evaluate the complete PM decision pipeline in real-world scenarios. We introduce PortBench, a benchmark spanning six heterogeneous asset classes over ten years. PortBench consists of two complementary layers: a static QA dataset of 6,269 correlation-based questions across seven task templates, and a dynamic five-stage allocation pipeline that mirrors the full PM decision cycle. To evaluate these layers, we introduce two dedicated metrics: a dual-layer correlation score that measures whether proposed portfolios exploit inter-class hedging and avoid intra-class concentration, and CEPS, a metric that quantifies how reasoning errors compound across pipeline stages. We further assess strategy robustness and investor alignment under three historical stress regimes and risk profiles. Evaluating ten frontier LLMs, we find that despite strong performance on static financial QA, 90\% of model-profile combinations fail to outperform a basic equal-weight allocation, and models that satisfy every procedural constraint still suffer catastrophic drawdowns under stress. Our source code is available at \href{https://github.com/AgenticFinLab/portbench}{this https URL}.
Abstract（参考訳）: 大規模言語モデル (LLM) は、様々な財政的タスクで高いパフォーマンスを示しているが、重要な金融決定タスクであるポートフォリオ管理 (PM) は、まだベンチマークが不十分である。既存のベンチマークでは、クロスアセスト相関構造を無視し、真に多様化したポートフォリオと集中したポートフォリオを区別できず、実世界のシナリオで完全なPM決定パイプラインを評価できないという2つの大きなギャップが示されています。 PortBenchは、10年間で6つの異種資産クラスにまたがるベンチマークです。 PortBenchは,7つのタスクテンプレートにわたる6,269の相関ベースの静的QAデータセットと,PM決定サイクル全体を反映した動的5ステージアロケーションパイプラインという,2つの補完レイヤで構成されている。これらの層を評価するために,提案するポートフォリオがクラス間ヘッジを利用してクラス内濃度を回避しているかどうかを測定する2層相関スコアと,パイプラインステージ間での推論エラーがどのように複雑であるかを定量化するCEPSという2層相関スコアを導入する。 3つの歴史的ストレス体制とリスクプロファイルの下で、戦略の堅牢性と投資家の整合性をさらに評価する。 10つのフロンティア LLM を評価すると、静的な財務QA の強い性能にもかかわらず、90 % のモデルが基本的等重割当を上回り得ず、全ての手続き的制約を満たすモデルが、ストレスの下で破滅的な損失を被っていることが分かる。ソースコードは \href{https://github.com/AgenticFinLab/portbench}{this https URL} で公開されています。

論文の概要: PortBench: A Correlation-Aware, Full-Pipeline Benchmark for LLM-Driven Portfolio Management

関連論文リスト