A Transformer Model for Symbolic Regression towards Scientific Discovery
- URL: http://arxiv.org/abs/2312.04070v2
- Date: Wed, 13 Dec 2023 22:20:14 GMT
- Title: A Transformer Model for Symbolic Regression towards Scientific Discovery
- Authors: Florian Lalande, Yoshitomo Matsubara, Naoya Chiba, Tatsunori Taniai,
Ryo Igarashi, Yoshitaka Ushiku
- Abstract summary: Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets.
We propose a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery.
We apply our best model to the SRSD datasets which yields state-of-the-art results using the normalized tree-based edit distance.
- Score: 11.827358526480323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Symbolic Regression (SR) searches for mathematical expressions which best
describe numerical datasets. This allows to circumvent interpretation issues
inherent to artificial neural networks, but SR algorithms are often
computationally expensive. This work proposes a new Transformer model aiming at
Symbolic Regression particularly focused on its application for Scientific
Discovery. We propose three encoder architectures with increasing flexibility
but at the cost of column-permutation equivariance violation. Training results
indicate that the most flexible architecture is required to prevent from
overfitting. Once trained, we apply our best model to the SRSD datasets
(Symbolic Regression for Scientific Discovery datasets) which yields
state-of-the-art results using the normalized tree-based edit distance, at no
extra computational cost.
Related papers
- Discovering symbolic expressions with parallelized tree search [59.92040079807524]
Symbolic regression plays a crucial role in scientific research thanks to its capability of discovering concise and interpretable mathematical expressions from data.
Existing algorithms have faced a critical bottleneck of accuracy and efficiency over a decade when handling problems of complexity.
We introduce a parallelized tree search (PTS) model to efficiently distill generic mathematical expressions from limited data.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - Robust Capped lp-Norm Support Vector Ordinal Regression [85.84718111830752]
Ordinal regression is a specialized supervised problem where the labels show an inherent order.
Support Vector Ordinal Regression, as an outstanding ordinal regression model, is widely used in many ordinal regression tasks.
We introduce a new model, Capped $ell_p$-Norm Support Vector Ordinal Regression(CSVOR), that is robust to outliers.
arXiv Detail & Related papers (2024-04-25T13:56:05Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Discovering Interpretable Physical Models using Symbolic Regression and
Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models.
DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems.
We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z) - Scalable Neural Symbolic Regression using Control Variables [7.725394912527969]
We propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability.
The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs)
Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables.
arXiv Detail & Related papers (2023-06-07T18:30:25Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Transformer-based Planning for Symbolic Regression [18.90700817248397]
We propose TPSR, a Transformer-based Planning strategy for Symbolic Regression.
Unlike conventional decoding strategies, TPSR enables the integration of non-differentiable feedback, such as fitting accuracy and complexity.
Our approach outperforms state-of-the-art methods, enhancing the model's fitting-complexity trade-off, Symbolic abilities, and robustness to noise.
arXiv Detail & Related papers (2023-03-13T03:29:58Z) - Symbolic Expression Transformer: A Computer Vision Approach for Symbolic
Regression [9.978824294461196]
Symbolic Regression (SR) is a type of regression analysis to automatically find the mathematical expression that best fits the data.
Inspired by the fact that human beings can infer a mathematical expression based on the curve of it, we propose Symbolic Expression Transformer (SET)
SET is a sample-agnostic model from the perspective of computer vision for SR.
arXiv Detail & Related papers (2022-05-24T05:35:46Z) - A Hybrid Framework for Sequential Data Prediction with End-to-End
Optimization [0.0]
We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates hand-designed features and manual model selection issues.
We employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression.
We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets.
arXiv Detail & Related papers (2022-03-25T17:13:08Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.