Rethinking Symbolic Regression Datasets and Benchmarks for Scientific
Discovery
- URL: http://arxiv.org/abs/2206.10540v5
- Date: Tue, 5 Mar 2024 07:36:09 GMT
- Title: Rethinking Symbolic Regression Datasets and Benchmarks for Scientific
Discovery
- Authors: Yoshitomo Matsubara, Naoya Chiba, Ryo Igarashi, Yoshitaka Ushiku
- Abstract summary: This paper revisits datasets and evaluation criteria for Symbolic Regression (SR)
We recreate 120 datasets to discuss the performance of symbolic regression for scientific discovery (SRSD)
- Score: 12.496525234064888
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper revisits datasets and evaluation criteria for Symbolic Regression
(SR), specifically focused on its potential for scientific discovery. Focused
on a set of formulas used in the existing datasets based on Feynman Lectures on
Physics, we recreate 120 datasets to discuss the performance of symbolic
regression for scientific discovery (SRSD). For each of the 120 SRSD datasets,
we carefully review the properties of the formula and its variables to design
reasonably realistic sampling ranges of values so that our new SRSD datasets
can be used for evaluating the potential of SRSD such as whether or not an SR
method can (re)discover physical laws from such datasets. We also create
another 120 datasets that contain dummy variables to examine whether SR methods
can choose necessary variables only. Besides, we propose to use normalized edit
distances (NED) between a predicted equation and the true equation trees for
addressing a critical issue that existing SR metrics are either binary or
errors between the target values and an SR model's predicted values for a given
input. We conduct benchmark experiments on our new SRSD datasets using various
representative SR methods. The experimental results show that we provide a more
realistic performance evaluation, and our user study shows that the NED
correlates with human judges significantly more than an existing SR metric. We
publish repositories of our code and 240 SRSD datasets.
Related papers
- Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model [66.91323540178739]
Sequential recommendation (SR) aims to predict items that users may be interested in based on their historical behavior.
We revisit SR from a novel information-theoretic perspective and find that sequential modeling methods fail to adequately capture randomness and unpredictability of user behavior.
Inspired by fuzzy information processing theory, this paper introduces the fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users' real interests.
arXiv Detail & Related papers (2024-10-31T14:52:01Z) - Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR)
With this, we propose an automated image evaluation pipeline.
We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z) - A Reproducible Analysis of Sequential Recommender Systems [13.987953631479662]
SequentialEnsurer Systems (SRSs) have emerged as a highly efficient approach to recommendation systems.
Existing works exhibit shortcomings in replicability of results, leading to inconsistent statements across papers.
Our work fills these gaps by standardising data pre-processing and model implementations.
arXiv Detail & Related papers (2024-08-07T16:23:29Z) - Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR [99.13855300096925]
We run a study on the Touch'e 2020 data to explore the potential limits of neural retrieval models.
Our black-box evaluation reveals an inherent bias of neural models towards retrieving short passages.
As many of the short Touch'e passages are not argumentative and thus non-relevant per se, we denoise the Touch'e 2020 data by excluding very short passages.
arXiv Detail & Related papers (2024-07-10T16:07:51Z) - Multi-View Symbolic Regression [1.2334534968968969]
We present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously.
MvSR fits the evaluated expression to each independent dataset and returns a parametric family of functions.
We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy.
arXiv Detail & Related papers (2024-02-06T15:53:49Z) - A Transformer Model for Symbolic Regression towards Scientific Discovery [11.827358526480323]
Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets.
We propose a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery.
We apply our best model to the SRSD datasets which yields state-of-the-art results using the normalized tree-based edit distance.
arXiv Detail & Related papers (2023-12-07T06:27:48Z) - Soft Random Sampling: A Theoretical and Empirical Analysis [59.719035355483875]
Soft random sampling (SRS) is a simple yet effective approach for efficient deep neural networks when dealing with massive data.
It selects a uniformly speed at random with replacement from each data set in each epoch.
It is shown to be a powerful and competitive strategy with significant and competitive performance on real-world industrial scale.
arXiv Detail & Related papers (2023-11-21T17:03:21Z) - Active Learning in Symbolic Regression with Physical Constraints [0.4037357056611557]
Evolutionary symbolic regression (SR) fits a symbolic equation to data, which gives a concise interpretable model.
We explore using SR as a method to propose which data to gather in an active learning setting with physical constraints.
arXiv Detail & Related papers (2023-05-17T17:07:25Z) - GSR: A Generalized Symbolic Regression Approach [13.606672419862047]
Generalized Symbolic Regression presented in this paper.
We show that our GSR method outperforms several state-of-the-art methods on the well-known Symbolic Regression benchmark problem sets.
We highlight the strengths of GSR by introducing SymSet, a new SR benchmark set which is more challenging relative to the existing benchmarks.
arXiv Detail & Related papers (2022-05-31T07:20:17Z) - Self-Supervised Neural Architecture Search for Imbalanced Datasets [129.3987858787811]
Neural Architecture Search (NAS) provides state-of-the-art results when trained on well-curated datasets with annotated labels.
We propose a NAS-based framework that bears the threefold contributions: (a) we focus on the self-supervised scenario, where no labels are required to determine the architecture, and (b) we assume the datasets are imbalanced.
arXiv Detail & Related papers (2021-09-17T14:56:36Z) - Dynamic Refinement Network for Oriented and Densely Packed Object
Detection [75.29088991850958]
We present a dynamic refinement network that consists of two novel components, i.e., a feature selection module (FSM) and a dynamic refinement head (DRH)
Our FSM enables neurons to adjust receptive fields in accordance with the shapes and orientations of target objects, whereas the DRH empowers our model to refine the prediction dynamically in an object-aware manner.
We perform quantitative evaluations on several publicly available benchmarks including DOTA, HRSC2016, SKU110K, and our own SKU110K-R dataset.
arXiv Detail & Related papers (2020-05-20T11:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.