Towards Personalized Preprocessing Pipeline Search
- URL: http://arxiv.org/abs/2302.14329v1
- Date: Tue, 28 Feb 2023 05:45:05 GMT
- Title: Towards Personalized Preprocessing Pipeline Search
- Authors: Diego Martinez, Daochen Zha, Qiaoyu Tan, Xia Hu
- Abstract summary: ClusterP3S is a novel framework for Personalized Preprocessing Pipeline Search via Clustering.
We propose a hierarchical search strategy to jointly learn the clusters and search for the optimal pipelines.
Experiments on benchmark classification datasets demonstrate the effectiveness of enabling feature-wise preprocessing pipeline search.
- Score: 52.59156206880384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature preprocessing, which transforms raw input features into numerical
representations, is a crucial step in automated machine learning (AutoML)
systems. However, the existing systems often have a very small search space for
feature preprocessing with the same preprocessing pipeline applied to all the
numerical features. This may result in sub-optimal performance since different
datasets often have various feature characteristics, and features within a
dataset may also have their own preprocessing preferences. To bridge this gap,
we explore personalized preprocessing pipeline search, where the search
algorithm is allowed to adopt a different preprocessing pipeline for each
feature. This is a challenging task because the search space grows
exponentially with more features. To tackle this challenge, we propose
ClusterP3S, a novel framework for Personalized Preprocessing Pipeline Search
via Clustering. The key idea is to learn feature clusters such that the search
space can be significantly reduced by using the same preprocessing pipeline for
the features within a cluster. To this end, we propose a hierarchical search
strategy to jointly learn the clusters and search for the optimal pipelines,
where the upper-level search optimizes the feature clustering to enable better
pipelines built upon the clusters, and the lower-level search optimizes the
pipeline given a specific cluster assignment. We instantiate this idea with a
deep clustering network that is trained with reinforcement learning at the
upper level, and random search at the lower level. Experiments on benchmark
classification datasets demonstrate the effectiveness of enabling feature-wise
preprocessing pipeline search.
Related papers
- Automating Data Science Pipelines with Tensor Completion [4.956678070210018]
We model data science pipelines as instances of tensor completion.
The goal is to identify all missing entries of the tensor, corresponding to all combinations of variable values.
We extensively evaluate existing and proposed methods in a number of datasets.
arXiv Detail & Related papers (2024-10-08T22:34:08Z) - A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We also develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
Our proposed ReSFU framework consistently achieves satisfactory performance on different segmentation applications.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Deep Pipeline Embeddings for AutoML [11.168121941015015]
AutoML is a promising direction for democratizing AI by automatically deploying Machine Learning systems with minimal human expertise.
Existing Pipeline Optimization techniques fail to explore deep interactions between pipeline stages/components.
This paper proposes a novel neural architecture that captures the deep interaction between the components of a Machine Learning pipeline.
arXiv Detail & Related papers (2023-05-23T12:40:38Z) - Pruning-as-Search: Efficient Neural Architecture Search via Channel
Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently.
Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z) - CFNet: Learning Correlation Functions for One-Stage Panoptic
Segmentation [46.252118473248316]
We propose to first predict semantic-level and instance-level correlations among different locations that are utilized to enhance the backbone features.
We then feed the improved discriminative features into the corresponding segmentation heads, respectively.
We achieve state-of-the-art performance on MS with $45.1$% PQ and ADE20k with $32.6$% PQ.
arXiv Detail & Related papers (2022-01-13T05:31:14Z) - Incremental Search Space Construction for Machine Learning Pipeline
Synthesis [4.060731229044571]
Automated machine learning (AutoML) aims for constructing machine learning (ML) pipelines automatically.
We propose a data-centric approach based on meta-features for pipeline construction.
We prove the effectiveness and competitiveness of our approach on 28 data sets used in well-established AutoML benchmarks.
arXiv Detail & Related papers (2021-01-26T17:17:49Z) - Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence.
We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.