FeatGeNN: Improving Model Performance for Tabular Data with
Correlation-based Feature Extraction
- URL: http://arxiv.org/abs/2308.07527v1
- Date: Tue, 15 Aug 2023 01:48:11 GMT
- Title: FeatGeNN: Improving Model Performance for Tabular Data with
Correlation-based Feature Extraction
- Authors: Sammuel Ramos Silva and Rodrigo Silva
- Abstract summary: FeatGeNN is a convolutional method that extracts and creates new features using correlation as a pooling function.
We evaluate our method on various benchmark datasets and demonstrate that FeatGeNN outperforms existing AutoFE approaches regarding model performance.
- Score: 0.22792085593908193
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated Feature Engineering (AutoFE) has become an important task for any
machine learning project, as it can help improve model performance and gain
more information for statistical analysis. However, most current approaches for
AutoFE rely on manual feature creation or use methods that can generate a large
number of features, which can be computationally intensive and lead to
overfitting. To address these challenges, we propose a novel convolutional
method called FeatGeNN that extracts and creates new features using correlation
as a pooling function. Unlike traditional pooling functions like max-pooling,
correlation-based pooling considers the linear relationship between the
features in the data matrix, making it more suitable for tabular data. We
evaluate our method on various benchmark datasets and demonstrate that FeatGeNN
outperforms existing AutoFE approaches regarding model performance. Our results
suggest that correlation-based pooling can be a promising alternative to
max-pooling for AutoFE in tabular data applications.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Statistical Test for Auto Feature Engineering by Selective Inference [12.703556860454565]
Auto Feature Engineering (AFE) plays a crucial role in developing practical machine learning pipelines.
We propose a new statistical test for generated features by AFE algorithms based on a framework called selective inference.
The proposed test can quantify the statistical significance of the generated features in the form of $p$-values, enabling theoretically guaranteed control of the risk of false findings.
arXiv Detail & Related papers (2024-10-13T12:26:51Z) - FeatNavigator: Automatic Feature Augmentation on Tabular Data [29.913561808461612]
FeatNavigator is a framework that explores and integrates high-quality features in relational tables for machine learning (ML) models.
We show that FeatNavigator outperforms state-of-the-art solutions on five public datasets by up to 40.1% in ML model performance.
arXiv Detail & Related papers (2024-06-13T18:44:48Z) - AutoFT: Learning an Objective for Robust Fine-Tuning [60.641186718253735]
Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning.
Current approaches to robust fine-tuning use hand-crafted regularization techniques.
We propose AutoFT, a data-driven approach for robust fine-tuning.
arXiv Detail & Related papers (2024-01-18T18:58:49Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - FAStEN: An Efficient Adaptive Method for Feature Selection and Estimation in High-Dimensional Functional Regressions [7.674715791336311]
We propose a new, flexible and ultra-efficient approach to perform feature selection in a sparse function-on-function regression problem.
We show how to extend it to the scalar-on-function framework.
We present an application to brain fMRI data from the AOMIC PIOP1 study.
arXiv Detail & Related papers (2023-03-26T19:41:17Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Data Fusion with Latent Map Gaussian Processes [0.0]
Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design.
We introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion.
arXiv Detail & Related papers (2021-12-04T00:54:19Z) - ARM-Net: Adaptive Relation Modeling Network for Structured Data [29.94433633729326]
ARM-Net is an adaptive relation modeling network tailored for structured data and a lightweight framework ARMOR based on ARM-Net for relational data.
We show that ARM-Net consistently outperforms existing models and provides more interpretable predictions for datasets.
arXiv Detail & Related papers (2021-07-05T07:37:24Z) - Efficient Data-specific Model Search for Collaborative Filtering [56.60519991956558]
Collaborative filtering (CF) is a fundamental approach for recommender systems.
In this paper, motivated by the recent advances in automated machine learning (AutoML), we propose to design a data-specific CF model.
Key here is a new framework that unifies state-of-the-art (SOTA) CF methods and splits them into disjoint stages of input encoding, embedding function, interaction and prediction function.
arXiv Detail & Related papers (2021-06-14T14:30:32Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.