Related papers: End-to-end symbolic regression with transformers

End-to-end symbolic regression with transformers

URL: http://arxiv.org/abs/2204.10532v1
Date: Fri, 22 Apr 2022 06:55:43 GMT
Title: End-to-end symbolic regression with transformers
Authors: Pierre-Alexandre Kamienny, St\'ephane d'Ascoli, Guillaume Lample, Fran\c{c}ois Charton
Abstract summary: Symbolic magnitude regression is a difficult task which usually involves predicting the two-step procedure faster. We show that our model approaches the end-to-end approach Neural the constants as an informed Transformer.
Score: 20.172752966322214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the "skeleton" of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function. The dominant approach is genetic programming, which evolves candidates by iterating this subroutine a large number of times. Neural networks have recently been tasked to predict the correct skeleton in a single try, but remain much less powerful. In this paper, we challenge this two-step procedure, and task a Transformer to directly predict the full mathematical expression, constants included. One can subsequently refine the predicted constants by feeding them to the non-convex optimizer as an informed initialization. We present ablations to show that this end-to-end approach yields better results, sometimes even without the refinement step. We evaluate our model on problems from the SRBench benchmark and show that our model approaches the performance of state-of-the-art genetic programming with several orders of magnitude faster inference.

Related papers

Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks [1.3654846342364308]
We show that a first-scalable optimisation algorithm can efficiently use the exact inverse Hessian with absolute-value eigenvalues. A t-run of this series provides a new optimisation which is comparable to other first- and second-order optimisation methods.
arXiv Detail & Related papers (2023-10-23T13:11:30Z)
Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so. We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed. Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z)
Alternating minimization for generalized rank one matrix sensing: Sharp predictions from a random initialization [5.900674344455754]
We show a technique for estimating properties of a rank random matrix with i.i.d. We show sharp convergence guarantees exact recovery in a single step. Our analysis also exposes several other properties of this problem.
arXiv Detail & Related papers (2022-07-20T05:31:05Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Symbolic Regression via Neural-Guided Genetic Programming Population Seeding [6.9501458586819505]
Symbolic regression is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search and genetic programming. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component.
arXiv Detail & Related papers (2021-10-29T19:26:41Z)
Recognizing and Verifying Mathematical Equations using Multiplicative Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence. Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z)
Shape-constrained Symbolic Regression -- Improving Extrapolation with Prior Knowledge [0.0]
The aim is to find models which conform to expected behaviour and which have improved capabilities. The algorithms are tested on a set of 19 synthetic and four real-world regression problems. Shape-constrained regression produces the best results for the test set but also significantly larger models.
arXiv Detail & Related papers (2021-03-29T14:04:18Z)
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence. Traditional learning process of seq2seq models suffers from two problems. We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling. We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.