Fugu-MT 論文翻訳(概要): Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery

論文の概要: Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery

arxiv url: http://arxiv.org/abs/2604.09601v2
Date: Tue, 14 Apr 2026 05:41:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-19 19:09:11.539806
Title: Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery
Title（参考訳）: Hubble: 安全、多様性、再現可能なアルファ因子発見のためのLLM駆動のエージェントフレームワーク
Authors: Runze Shi, Shengyu Yan, Yuecheng Cai, Chengxi Lv,
Abstract要約: 本稿では,大規模言語モデル(LLM)とドメイン固有の演算子言語を組み合わせたエージェントファクタマイニングフレームワークであるHumbbleを紹介する。約500株の米国株式の世界において、当社のメインランは、実行時クラッシュゼロの3ラウンドで104人の有効な候補を評価しています。次に、上位5因子を修正し、2025-06-01から2026-03-13までの保留期間で検証する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Automated alpha discovery is difficult because the search space of formulaic factors is combinatorial, the signal-to-noise ratio in daily equity data is low, and unconstrained program generation is operationally unsafe. We present Hubble, an agentic factor mining framework that combines large language models (LLMs) with a domain-specific operator language, an abstract syntax tree (AST) execution sandbox, a dual-channel retrieval-augmented generation (RAG) module, and a family-aware selection mechanism. Instead of treating the LLM as an unconstrained code generator, Hubble restricts generation to interpretable operator trees, evaluates every candidate through a deterministic cross-sectional pipeline, and feeds back both top formulas and structured family-level diagnostics to subsequent rounds. The current system additionally introduces positive/negative RAG, formula-similarity penalties, standardized multi-metric scoring, dual reporting of RankIC and Pearson IC, and persistent diagnostics artifacts for post-hoc research analysis. On a U.S. equity universe of roughly 500 stocks, our main run evaluates 104 valid candidates across three rounds with zero runtime crashes and discovers a top set dominated by range, volatility, and trend families rather than crowded volume-only motifs. We then fix the resulting top-5 factors and validate them on a held-out period from 2025-06-01 to 2026-03-13. In this out-of-sample window, the two range factors and two volatility factors remain positive and several achieve HAC-significant Pearson IC and long-short evidence, whereas the weakest in-sample trend factor decays materially. These results suggest that safe LLM-guided search can be upgraded from a syntax-compliant generator into a reproducible alpha-research workflow that jointly optimizes validity, diversity, interpretability, and family-level generalization.
Abstract（参考訳）: 定式化因子の探索空間が組み合わさり、日々の株式データの信号対雑音比が低く、制約のないプログラム生成が運用上安全でないため、自動アルファ発見は困難である。本稿では,大規模言語モデル (LLM) とドメイン固有の演算子言語,抽象構文木 (AST) 実行サンドボックス,二重チャネル検索拡張生成 (RAG) モジュール,家族認識選択機構を組み合わせたエージェントファクタマイニングフレームワークであるHumbbleを紹介する。 LLMを制約のないコードジェネレータとして扱う代わりに、ハッブルは生成を解釈可能な演算木に制限し、決定論的断面パイプラインを通じてすべての候補を評価し、上位式と構造化された家族レベルの診断をその後のラウンドに戻す。このシステムには、正負のRAG、式相似罰則、標準化されたマルチメトリックスコア、RandICとPearson ICの二重報告、およびポストホック研究分析のための永続的な診断アーティファクトが導入されている。約500株の米国株式の世界において、当社のメインランは、実行時のクラッシュがゼロの3ラウンドで、104の有効な候補を評価し、混み合ったボリュームのみのモチーフではなく、範囲、ボラティリティ、トレンドファミリに支配されるトップセットを発見した。次に、上位5因子を修正し、2025-06-01から2026-03-13までの保留期間で検証する。このアウト・オブ・サンプルウィンドウでは、2つの範囲因子と2つのボラティリティ因子が正であり、いくつかのHACに代表されるピアソンICとロングショートな証拠が得られたが、最も弱いイン・サンプルトレンド因子は物質的に崩壊する。これらの結果から, 安全なLLM誘導探索は, 構文に適合したジェネレータから, 妥当性, 多様性, 解釈可能性, 家族レベルの一般化を共同で最適化する再現可能なアルファ検索ワークフローにアップグレード可能であることが示唆された。

論文の概要: Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery

関連論文リスト