Fugu-MT 論文翻訳(概要): LeJOT-AutoML: LLM-Driven Feature Engineering for Job Execution Time Prediction in Databricks Cost Optimization

論文の概要: LeJOT-AutoML: LLM-Driven Feature Engineering for Job Execution Time Prediction in Databricks Cost Optimization

arxiv url: http://arxiv.org/abs/2603.07897v1
Date: Mon, 09 Mar 2026 02:31:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.348437
Title: LeJOT-AutoML: LLM-Driven Feature Engineering for Job Execution Time Prediction in Databricks Cost Optimization
Title（参考訳）: LeJOT-AutoML:Databricksコスト最適化におけるジョブ実行時間予測のためのLLM駆動機能エンジニアリング
Authors: Lizhi Ma, Yi-Xiang Hu, Yihui Ren, Feng Wu, Xiang-Yang Li,
Abstract要約: Databricksのジョブオーケストレーションシステム(例:LeJOT)は、レイテンシと依存性の制約を満たしながら、低価格の計算を選択することで、クラウドコストを削減する。既存のパイプラインは、静的で手動で構築されたランタイム効果に依存している。エージェント駆動型AutoMLフレームワークであるLeJOT-AutoMLについて,MLライフサイクルを通じて大規模言語モデルエージェントを組み込む。
参考スコア（独自算出の注目度）: 27.72622904072875
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Databricks job orchestration systems (e.g., LeJOT) reduce cloud costs by selecting low-priced compute configurations while meeting latency and dependency constraints. Accurate execution-time prediction under heterogeneous instance types and non-stationary runtime conditions is therefore critical. Existing pipelines rely on static, manually engineered features that under-capture runtime effects (e.g., partition pruning, data skew, and shuffle amplification), and predictive signals are scattered across logs, metadata, and job scripts-lengthening update cycles and increasing engineering overhead. We present LeJOT-AutoML, an agent-driven AutoML framework that embeds large language model agents throughout the ML lifecycle. LeJOT-AutoML combines retrieval-augmented generation over a domain knowledge base with a Model Context Protocol toolchain (log parsers, metadata queries, and a read-only SQL sandbox) to analyze job artifacts, synthesize and validate feature-extraction code via safety gates, and train/select predictors. This design materializes runtime-derived features that are difficult to obtain through static analysis alone. On enterprise Databricks workloads, LeJOT-AutoML generates over 200 features and reduces the feature-engineering and evaluation loop from weeks to 20-30 minutes, while maintaining competitive prediction accuracy. Integrated into the LeJOT pipeline, it enables automated continuous model updates and achieves 19.01% cost savings in our deployment setting through improved orchestration.
Abstract（参考訳）: Databricksのジョブオーケストレーションシステム(例:LeJOT)は、レイテンシと依存性の制約を満たしながら、低価格の計算構成を選択することで、クラウドコストを削減する。したがって、不均一なインスタンスタイプと非定常ランタイム条件下での正確な実行時間予測が重要である。既存のパイプラインは、キャプチャ下のランタイム効果(パーティションプルーニング、データスキュー、シャッフル増幅など)を静的に手動で設計した機能に依存しており、予測信号はログ、メタデータ、ジョブスクリプト延長更新サイクルに分散し、エンジニアリングオーバーヘッドが増加する。本稿では,大規模な言語モデルエージェントをMLライフサイクル全体に組み込む,エージェント駆動型AutoMLフレームワークであるLeJOT-AutoMLを紹介する。 LeJOT-AutoMLはドメイン知識ベース上での検索拡張生成とモデルコンテキストプロトコルツールチェーン(ログパーサ、メタデータクエリ、読み取り専用SQLサンドボックス)を組み合わせることで、ジョブアーティファクトの分析、安全ゲート経由の機能抽出コードの合成と検証、およびトレーニング/セレクション予測を行う。この設計は静的解析だけでは入手が難しいランタイム由来の機能を実現する。エンタープライズDatabricksワークロードでは、LeJOT-AutoMLは200以上の機能を生成し、機能エンジニアリングと評価ループを数週間から20～30分に短縮するとともに、競合予測の精度を維持している。 LeJOTパイプラインに統合され、継続的モデルの自動更新を可能にし、オーケストレーションの改善を通じてデプロイメント設定における19.01%のコスト削減を実現している。

論文の概要: LeJOT-AutoML: LLM-Driven Feature Engineering for Job Execution Time Prediction in Databricks Cost Optimization

関連論文リスト