Beyond Formula Complexity: Effective Information Criterion Improves Performance and Interpretability for Symbolic Regression
- URL: http://arxiv.org/abs/2509.21780v1
- Date: Fri, 26 Sep 2025 02:32:43 GMT
- Title: Beyond Formula Complexity: Effective Information Criterion Improves Performance and Interpretability for Symbolic Regression
- Authors: Zihan Yu, Guanren Wang, Jingtao Ding, Huandong Wang, Yong Li,
- Abstract summary: Symbolic regression discovers accurate and interpretable formulas to describe given data.<n>The Effective Information Criterion (EIC) treats formulas as information processing systems with specific internal structures.<n>EIC shows a 70.2% agreement with 108 human experts' preferences for formula interpretability, demonstrating that EIC, by measuring the unreasonable structures in formulas, actually reflects the formula's interpretability.
- Score: 28.292981389559372
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Symbolic regression discovers accurate and interpretable formulas to describe given data, thereby providing scientific insights for domain experts and promoting scientific discovery. However, existing symbolic regression methods often use complexity metrics as a proxy for interoperability, which only considers the size of the formula but ignores its internal mathematical structure. Therefore, while they can discover formulas with compact forms, the discovered formulas often have structures that are difficult to analyze or interpret mathematically. In this work, inspired by the observation that physical formulas are typically numerically stable under limited calculation precision, we propose the Effective Information Criterion (EIC). It treats formulas as information processing systems with specific internal structures and identifies the unreasonable structure in them by the loss of significant digits or the amplification of rounding noise as data flows through the system. We find that this criterion reveals the gap between the structural rationality of models discovered by existing symbolic regression algorithms and real-world physical formulas. Combining EIC with various search-based symbolic regression algorithms improves their performance on the Pareto frontier and reduces the irrational structure in the results. Combining EIC with generative-based algorithms reduces the number of samples required for pre-training, improving sample efficiency by 2~4 times. Finally, for different formulas with similar accuracy and complexity, EIC shows a 70.2% agreement with 108 human experts' preferences for formula interpretability, demonstrating that EIC, by measuring the unreasonable structures in formulas, actually reflects the formula's interpretability.
Related papers
- Super-resolution of turbulent reacting flows on complex meshes using graph neural networks [0.0]
State-of-the-art deep learning models have been extensively utilized to reconstruct small-scale structures from coarse-grained data in turbulent flows.<n>In this study, we leverage the inherent flexibility of graph neural networks (GNNs) to develop a methodology for reconstructing unresolved small-scale structures from low-resolution data on complex meshes.
arXiv Detail & Related papers (2026-03-01T12:53:14Z) - Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression [22.345828337550575]
Symbolic regression aims to discover closed-form mathematical expressions that accurately describe data.<n>Existing SR methods often rely on population-based search or autoregressive modeling.<n>We introduce LIES (Logarithm, Identity, Exponential, Sine), a fixed neural network architecture with interpretable primitive activations that are optimized to model symbolic expressions.
arXiv Detail & Related papers (2025-06-09T22:05:53Z) - Enhancing Symbolic Regression and Universal Physics-Informed Neural Networks with Dimensional Analysis [0.7864304771129751]
We present a new method for enhancing symbolic regression for differential equations via dimensional analysis.
We integrate Ipsen's method of dimensional analysis with Universal Physics-Informed Neural Networks.
We show that transforming data into a dimensionless form significantly decreases time and improves accuracy of computation.
arXiv Detail & Related papers (2024-11-24T17:07:39Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Discovering physical laws with parallel symbolic enumeration [67.36739393470869]
We introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data.<n>Experiments show that PSE achieves higher accuracy and faster computation compared to the state-of-the-art baseline algorithms.<n> PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Assessment of Uncertainty Quantification in Universal Differential Equations [1.374796982212312]
Universal Differential Equations (UDEs) are used to combine prior knowledge in the form of mechanistic formulations with universal function approximators, like neural networks.
We provide a formalisation of uncertainty quantification (UQ) for UDEs and investigate important frequentist and Bayesian methods.
arXiv Detail & Related papers (2024-06-13T06:36:19Z) - Shape Arithmetic Expressions: Advancing Scientific Discovery Beyond Closed-Form Equations [56.78271181959529]
Generalized Additive Models (GAMs) can capture non-linear relationships between variables and targets, but they cannot capture intricate feature interactions.
We propose Shape Expressions Arithmetic ( SHAREs) that fuses GAM's flexible shape functions with the complex feature interactions found in mathematical expressions.
We also design a set of rules for constructing SHAREs that guarantee transparency of the found expressions beyond the standard constraints.
arXiv Detail & Related papers (2024-04-15T13:44:01Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - A deep learning driven pseudospectral PCE based FFT homogenization
algorithm for complex microstructures [68.8204255655161]
It is shown that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
It is shown, that the proposed method is able to predict central moments of interest while being magnitudes faster to evaluate than traditional approaches.
arXiv Detail & Related papers (2021-10-26T07:02:14Z) - Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures.
Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.