Fugu-MT 論文翻訳(概要): Beyond Formula Complexity: Effective Information Criterion Improves Performance and Interpretability for Symbolic Regression

論文の概要: Beyond Formula Complexity: Effective Information Criterion Improves Performance and Interpretability for Symbolic Regression

arxiv url: http://arxiv.org/abs/2509.21780v1
Date: Fri, 26 Sep 2025 02:32:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.129367
Title: Beyond Formula Complexity: Effective Information Criterion Improves Performance and Interpretability for Symbolic Regression
Title（参考訳）: フォーミュラの複雑さを超えて: 効果的な情報基準により、シンボリック回帰のパフォーマンスと解釈性が向上する
Authors: Zihan Yu, Guanren Wang, Jingtao Ding, Huandong Wang, Yong Li,
Abstract要約: 記号回帰は与えられたデータを記述するための正確で解釈可能な公式を発見する。有効情報基準(EIC)は、公式を特定の内部構造を持つ情報処理システムとして扱う。 EICは108人の専門家による公式解釈可能性の選好と70.2%の合意を示し、式中の不合理な構造を測定することで、実際に公式の解釈可能性を反映していることを示した。
参考スコア（独自算出の注目度）: 28.292981389559372
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Symbolic regression discovers accurate and interpretable formulas to describe given data, thereby providing scientific insights for domain experts and promoting scientific discovery. However, existing symbolic regression methods often use complexity metrics as a proxy for interoperability, which only considers the size of the formula but ignores its internal mathematical structure. Therefore, while they can discover formulas with compact forms, the discovered formulas often have structures that are difficult to analyze or interpret mathematically. In this work, inspired by the observation that physical formulas are typically numerically stable under limited calculation precision, we propose the Effective Information Criterion (EIC). It treats formulas as information processing systems with specific internal structures and identifies the unreasonable structure in them by the loss of significant digits or the amplification of rounding noise as data flows through the system. We find that this criterion reveals the gap between the structural rationality of models discovered by existing symbolic regression algorithms and real-world physical formulas. Combining EIC with various search-based symbolic regression algorithms improves their performance on the Pareto frontier and reduces the irrational structure in the results. Combining EIC with generative-based algorithms reduces the number of samples required for pre-training, improving sample efficiency by 2~4 times. Finally, for different formulas with similar accuracy and complexity, EIC shows a 70.2% agreement with 108 human experts' preferences for formula interpretability, demonstrating that EIC, by measuring the unreasonable structures in formulas, actually reflects the formula's interpretability.
Abstract（参考訳）: 記号回帰は与えられたデータを記述するための正確で解釈可能な公式を発見し、ドメインの専門家に科学的洞察を与え、科学的発見を促進する。しかし、既存のシンボリック回帰法は、しばしば複雑性メトリクスを相互運用性のプロキシとして使うが、これは公式のサイズを考慮せず、内部の数学的構造を無視している。したがって、コンパクトな形式を持つ公式を発見できるが、発見された公式は、数学的に解析や解釈が難しい構造を持つことが多い。本研究は, 計算精度が制限された場合, 物理式が数値的に安定であることに着想を得て, 有効情報基準(EIC)を提案する。公式を特定の内部構造を持つ情報処理システムとして扱い、重要な桁の喪失や丸みを帯びたノイズの増幅によって不適切な構造を識別する。この基準は、既存の記号回帰アルゴリズムによって発見されたモデルの構造的合理性と実世界の物理式とのギャップを明らかにする。 EICと様々な検索に基づく記号回帰アルゴリズムを組み合わせることで、パレートフロンティアの性能が向上し、結果の不合理構造が低減される。 EICと生成アルゴリズムを組み合わせることで、事前学習に必要なサンプルの数を減らし、サンプル効率を2～4倍向上させる。最後に、同様の精度と複雑さの異なる式に対して、EICは108人の専門家による論理解釈可能性の選好と70.2%の一致を示し、論理式における不合理な構造を測定することで、論理式の解釈可能性を実際に反映していることを示した。

論文の概要: Beyond Formula Complexity: Effective Information Criterion Improves Performance and Interpretability for Symbolic Regression

関連論文リスト