OpenFE: Automated Feature Generation with Expert-level Performance
- URL: http://arxiv.org/abs/2211.12507v3
- Date: Mon, 5 Jun 2023 13:22:12 GMT
- Title: OpenFE: Automated Feature Generation with Expert-level Performance
- Authors: Tianping Zhang, Zheyu Zhang, Zhiyuan Fan, Haoyan Luo, Fengyuan Liu,
Qian Liu, Wei Cao, Jian Li
- Abstract summary: We present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts.
OpenFE achieves high efficiency and accuracy with two components: 1) a novel feature boosting method for accurately evaluating the incremental performance of candidate features and 2) a two-stage pruning algorithm that performs feature pruning in a coarse-to-fine manner.
- Score: 12.953889090552616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of automated feature generation is to liberate machine learning
experts from the laborious task of manual feature generation, which is crucial
for improving the learning performance of tabular data. The major challenge in
automated feature generation is to efficiently and accurately identify
effective features from a vast pool of candidate features. In this paper, we
present OpenFE, an automated feature generation tool that provides competitive
results against machine learning experts. OpenFE achieves high efficiency and
accuracy with two components: 1) a novel feature boosting method for accurately
evaluating the incremental performance of candidate features and 2) a two-stage
pruning algorithm that performs feature pruning in a coarse-to-fine manner.
Extensive experiments on ten benchmark datasets show that OpenFE outperforms
existing baseline methods by a large margin. We further evaluate OpenFE in two
Kaggle competitions with thousands of data science teams participating. In the
two competitions, features generated by OpenFE with a simple baseline model can
beat 99.3% and 99.6% data science teams respectively. In addition to the
empirical results, we provide a theoretical perspective to show that feature
generation can be beneficial in a simple yet representative setting. The code
is available at https://github.com/ZhangTP1996/OpenFE.
Related papers
- IIFE: Interaction Information Based Automated Feature Engineering [11.866061471514582]
We introduce a new AutoFE algorithm, IIFE, based on determining which feature pairs synergize well.
We show how interaction information can be used to improve existing AutoFE algorithms.
arXiv Detail & Related papers (2024-09-07T00:34:26Z) - FedSDG-FS: Efficient and Secure Feature Selection for Vertical Federated
Learning [21.79965380400454]
Vertical Learning (VFL) enables multiple data owners, each holding a different subset of features about largely overlapping sets of data sample(s) to jointly train a useful global model.
Feature selection (FS) is important to VFL. It is still an open research problem as existing FS works designed for VFL either assumes prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features.
We propose the Federated Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian dual-gate to efficiently approximate the probability of a feature being selected, with privacy
arXiv Detail & Related papers (2023-02-21T03:09:45Z) - Toward Efficient Automated Feature Engineering [27.47868891738917]
Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks.
Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment.
We construct the AFE pipeline based on reinforcement learning setting, where each feature is assigned an agent to perform feature transformation.
We conduct comprehensive experiments on 36 datasets in terms of both classification and regression tasks.
arXiv Detail & Related papers (2022-12-26T13:18:51Z) - GraphLearner: Graph Node Clustering with Fully Learnable Augmentation [76.63963385662426]
Contrastive deep graph clustering (CDGC) leverages the power of contrastive learning to group nodes into different clusters.
We propose a Graph Node Clustering with Fully Learnable Augmentation, termed GraphLearner.
It introduces learnable augmentors to generate high-quality and task-specific augmented samples for CDGC.
arXiv Detail & Related papers (2022-12-07T10:19:39Z) - Explored An Effective Methodology for Fine-Grained Snake Recognition [8.908667065576632]
We design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification.
In order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training.
Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.
arXiv Detail & Related papers (2022-07-24T02:19:15Z) - GANDALF: Gated Adaptive Network for Deep Automated Learning of Features [0.0]
Gated Adaptive Network for Deep Automated Learning of Features (GANDALF)
GANDALF relies on a new tabular processing unit with a gating mechanism and in-built feature selection called Gated Feature Learning Unit (GFLU)
We demonstrate that GANDALF outperforms or stays at-par with SOTA approaches like XGBoost, SAINT, FT-Transformers, etc.
arXiv Detail & Related papers (2022-07-18T12:12:24Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Deep Reinforcement Learning of Graph Matching [63.469961545293756]
Graph matching (GM) under node and pairwise constraints has been a building block in areas from optimization to computer vision.
We present a reinforcement learning solver for GM i.e. RGM that seeks the node correspondence between pairwise graphs.
Our method differs from the previous deep graph matching model in the sense that they are focused on the front-end feature extraction and affinity function learning.
arXiv Detail & Related papers (2020-12-16T13:48:48Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.