Fugu-MT 論文翻訳(概要): ERBench: A Benchmark and Testsuite for Equation Discovery Algorithms

論文の概要: ERBench: A Benchmark and Testsuite for Equation Discovery Algorithms

arxiv url: http://arxiv.org/abs/2606.09276v1
Date: Mon, 08 Jun 2026 09:43:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.906382
Title: ERBench: A Benchmark and Testsuite for Equation Discovery Algorithms
Title（参考訳）: ERBench: 方程式探索アルゴリズムのベンチマークとテストスーツ
Authors: Paul Kahlmeyer, Henrik Voigt, Michael Habeck, Joachim Giesen,
Abstract要約: 方程式発見は、データから数学的方程式の形で科学モデルの発見を自動化することを目的としている。方程式発見のためのシンボリック回帰の性能は、テストデータの予測精度と既知の基底式の回復の2つの次元に沿って測定される。 Equation Recovery Benchmark (ERBench) は、方程式発見のタスクを対象とするアルゴリズムを厳格に評価する新しい評価フレームワークである。
参考スコア（独自算出の注目度）: 15.861690390576433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Equation discovery aims to automate the discovery of scientific models in the form of mathematical equations from data. Technically, equation discovery is implemented by symbolic regression algorithms. Performance of symbolic regression for equation discovery is measured along two dimensions: Prediction accuracy on test data, and recovery of known groundtruth formulas. For standard regression, accuracy is typically measured on in-domain test data, for instance, by splitting a data set randomly into training and test data. While this makes sense for in-domain interpolation, which is the common goal in ordinary regression, it can be a misleading proxy for true model discovery and generalization. The obvious alternative is to measure out-of-domain accuracy. However, obtaining challenging out-of-domain test data is a non-trivial problem. Therefore, we focus on equation recovery for evaluating symbolic regression algorithms for equation discovery. The rationale is that symbolic regression algorithms that perform well in recovering known groundtruth formulas are good candidates to perform well in unknown equation discovery. Existing benchmarks for symbolic regression include equation recovery tasks, however, with only a small number of groundtruth formulas that are publicly known. Moreover, these benchmarks place less emphasis on evaluating the robustness of algorithms in terms of their behavior under changing dimensionality, sampling size, sampling distribution and sampling domain. This, however, is of central importance to practitioners wanting to discover equations for modeling natural phenomena, since data is almost certainly noisy and comes from diverse domains, distributions, and sample sizes. To fill this gap, we introduce the Equation Recovery Benchmark (ERBench), a new evaluation framework designed to rigorously assess algorithms explicitly targeting the task of equation discovery.
Abstract（参考訳）: 方程式発見は、データから数学的方程式の形で科学モデルの発見を自動化することを目的としている。技術的には、方程式発見はシンボリック回帰アルゴリズムによって実装される。方程式発見のためのシンボリック回帰の性能は、テストデータの予測精度と既知の基底式の回復の2つの次元に沿って測定される。標準的な回帰では、例えば、データセットをランダムにトレーニングデータとテストデータに分割することで、ドメイン内のテストデータに基づいて精度が測定される。これは、通常の回帰における共通のゴールであるドメイン内補間には意味があるが、真のモデル発見と一般化のための誤解を招くプロキシである。明らかな代替手段は、ドメイン外の精度を測定することである。しかし、ドメイン外のテストデータを取得することは簡単な問題ではない。そこで我々は,方程式発見のためのシンボリック回帰アルゴリズムを評価するために,方程式回復に着目した。理論的には、未知の方程式の発見において、既知の基底式を回復する上でよく機能するシンボリック回帰アルゴリズムが良い候補である。しかし、既存のシンボリック回帰のベンチマークには、方程式の回復タスクが含まれており、一般に知られている基底式はごくわずかである。さらに、これらのベンチマークは、次元、サンプリングサイズ、サンプリング分布、サンプリング領域の変化の下でのアルゴリズムの振る舞いの観点から、アルゴリズムの堅牢性を評価することに重点を置いている。しかし、これは自然現象をモデル化するための方程式を発見したい実践者にとって重要であり、データはほぼ確実にノイズがあり、多様な領域、分布、サンプルサイズから来ている。このギャップを埋めるために、方程式探索タスクを対象とするアルゴリズムを厳格に評価する新しい評価フレームワークである方程式回復ベンチマーク(ERBench)を導入する。

論文の概要: ERBench: A Benchmark and Testsuite for Equation Discovery Algorithms

関連論文リスト