Fugu-MT 論文翻訳(概要): LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

論文の概要: LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

arxiv url: http://arxiv.org/abs/2605.09518v1
Date: Sun, 10 May 2026 13:00:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.292385
Title: LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection
Title（参考訳）: メタラーニングに基づくアルゴリズム選択のためのLLM駆動性能空間拡張
Authors: Darren Zhu, Daren Ler,
Abstract要約: 永続的な制限は、キュレートされた実世界のデータセットの数が少ないため、スパースメタデータセットが生成されることである。メタデータセットを大言語モデル(LLM)を用いて生成した合成回帰データセットで拡張することでこの問題に対処する。本研究では,(1)合成データセットを性能空間に分散する一様サンプリングと,(2)決定境界付近に集中するマージンベースサンプリングの2つを比較した。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Meta-learning for algorithm selection relies on a meta-dataset in which each row corresponds to a supervised learning dataset described by meta-features and labelled with a target value that is associated with algorithm choice (typically, some function of algorithm performance). A persistent limitation is that the number of curated real-world datasets is small, resulting in sparse meta-datasets that constrain meta-learner generalisation. In this paper, we address this problem by augmenting the meta-dataset with synthetic regression datasets produced via a large language model (LLM), with generation steered toward target regions of a low-dimensionality performance space. In our experiments, we adopt a two-dimensional geometric setting defined by the cross-validated $R^2$ scores of two anchor algorithms, known as landmarkers. We compare two augmentation strategies: (1) uniform sampling, which distributes synthetic datasets across the performance space; and (2) margin-based sampling, which concentrates them near the decision boundary where landmarker preference is most ambiguous. Across 42 real-world UCI regression datasets and 730 synthetic datasets, both strategies substantially improve meta-learner performance over the unaugmented baseline under regression and multi-label evaluation formulations. However, uniform augmentation consistently outperforms margin-based augmentation, achieving a 17.47% relative reduction in Hamming loss, a 100.41% relative improvement in subset accuracy, and a +6.09% relative gain in pooled out-of-fold $R^2$. These results lead us to postulate a central thesis: the performance of algorithms resides on a low-dimensional performance manifold, whose reconstruction bias may be minimised by user-guided LLMs that seek to maximise uniform $ε$-cover, and consequently, lead to improved meta-learning for algorithm selection.
Abstract（参考訳）: アルゴリズム選択のためのメタ学習は、各行がメタ機能によって記述された教師付き学習データセットに対応し、アルゴリズム選択に関連するターゲット値(典型的には、アルゴリズム性能のいくつかの機能)にラベル付けされるメタデータセットに依存する。永続的な制限は、キュレートされた実世界のデータセットの数は少なく、その結果、メタラーナーの一般化を制約するスパースなメタデータセットが生成されることである。本稿では,大言語モデル (LLM) を用いて生成した合成回帰データセットを用いてメタデータセットを拡張し,低次元性能空間のターゲット領域に向けて生成を行うことにより,この問題に対処する。実験では,ランドマークと呼ばれる2つのアンカーアルゴリズムの相互検証値R^2$スコアによって定義される2次元幾何学的設定を採用する。本研究では,(1)合成データセットを性能空間に分散する一様サンプリング,(2)マージンに基づくサンプリング,(2)ランドマークの選好が最も曖昧な決定境界付近に集中する一様サンプリングを比較した。 42の現実のUCI回帰データセットと730の合成データセットで、どちらの戦略も回帰とマルチラベル評価の定式化の下で、未拡張のベースライン上でのメタラーナー性能を大幅に改善する。しかし、一様増進はマージンベースの増進を一貫して上回り、17.47%のハミング損失の相対的な減少、100.41%のサブセット精度の相対的な改善、+6.09%の相対的な利得のプールアウトオブフォールド$R^2$を達成している。アルゴリズムの性能は低次元の性能多様体上に存在し、その再構成バイアスは、一様$ε$-coverを最大化するユーザガイド付きLLMによって最小化され、アルゴリズム選択のためのメタラーニングの改善につながる。

論文の概要: LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

関連論文リスト