Symbolic Regression as Feature Engineering Method for Machine and Deep
Learning Regression Tasks
- URL: http://arxiv.org/abs/2311.06028v1
- Date: Fri, 10 Nov 2023 12:34:28 GMT
- Title: Symbolic Regression as Feature Engineering Method for Machine and Deep
Learning Regression Tasks
- Authors: Assaf Shmuel, Oren Glickman, Teddy Lazebnik
- Abstract summary: In this study, we propose to integrate symbolic regression (SR) as an effective feature engineering (FE) process before a machine learning model.
We show, through extensive experimentation on synthetic and real-world physics-related datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and deep learning regression models.
- Score: 0.6906005491572401
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the realm of machine and deep learning regression tasks, the role of
effective feature engineering (FE) is pivotal in enhancing model performance.
Traditional approaches of FE often rely on domain expertise to manually design
features for machine learning models. In the context of deep learning models,
the FE is embedded in the neural network's architecture, making it hard for
interpretation. In this study, we propose to integrate symbolic regression (SR)
as an FE process before a machine learning model to improve its performance. We
show, through extensive experimentation on synthetic and real-world
physics-related datasets, that the incorporation of SR-derived features
significantly enhances the predictive capabilities of both machine and deep
learning regression models with 34-86% root mean square error (RMSE)
improvement in synthetic datasets and 4-11.5% improvement in real-world
datasets. In addition, as a realistic use-case, we show the proposed method
improves the machine learning performance in predicting superconducting
critical temperatures based on Eliashberg theory by more than 20% in terms of
RMSE. These results outline the potential of SR as an FE component in
data-driven models.
Related papers
- Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.
Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z) - SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications.
Current state-of-the-art methods focus on training innovative architectural designs on confined datasets.
We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z) - Efficient Frequency Selective Surface Analysis via End-to-End Model-Based Learning [2.66269503676104]
This paper introduces an innovative end-to-end model-based deep learning approach for efficient electromagnetic analysis of high-dimensional frequency selective surfaces (FSS)
Unlike traditional data-driven methods that require large datasets, this approach combines physical insights from equivalent circuit models with deep learning techniques to significantly reduce model complexity and enhance prediction accuracy.
arXiv Detail & Related papers (2024-10-22T07:27:20Z) - RVRAE: A Dynamic Factor Model Based on Variational Recurrent Autoencoder
for Stock Returns Prediction [5.281288833470249]
RVRAE is a probabilistic approach that addresses the temporal dependencies and noise in market data.
It is adept at risk modeling in volatile stock markets, estimating variances from latent space distributions while also predicting returns.
arXiv Detail & Related papers (2024-03-04T21:48:32Z) - Enhancing Dynamical System Modeling through Interpretable Machine
Learning Augmentations: A Case Study in Cathodic Electrophoretic Deposition [0.8796261172196743]
We introduce a comprehensive data-driven framework aimed at enhancing the modeling of physical systems.
As a demonstrative application, we pursue the modeling of cathodic electrophoretic deposition (EPD), commonly known as e-coating.
arXiv Detail & Related papers (2024-01-16T14:58:21Z) - Enhanced LFTSformer: A Novel Long-Term Financial Time Series Prediction Model Using Advanced Feature Engineering and the DS Encoder Informer Architecture [0.8532753451809455]
This study presents a groundbreaking model for forecasting long-term financial time series, termed the Enhanced LFTSformer.
The model distinguishes itself through several significant innovations.
Systematic experimentation on a range of benchmark stock market datasets demonstrates that the Enhanced LFTSformer outperforms traditional machine learning models.
arXiv Detail & Related papers (2023-10-03T08:37:21Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - End-to-End Speech Recognition: A Survey [68.35707678386949]
The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements.
All relevant aspects of E2E ASR are covered in this work, accompanied by discussions of performance and deployment opportunities.
arXiv Detail & Related papers (2023-03-03T01:46:41Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.