ABM: an automatic supervised feature engineering method for loss based
models based on group and fused lasso
- URL: http://arxiv.org/abs/2009.10498v1
- Date: Tue, 22 Sep 2020 12:42:22 GMT
- Title: ABM: an automatic supervised feature engineering method for loss based
models based on group and fused lasso
- Authors: Weijian Luo and Yongxian Long
- Abstract summary: A vital problem in solving classification or regression problem is to apply feature engineering and variable selection on data before fed into models.
This paper proposes an end-to-end supervised cutting point selection method based on group and lasso fused along with the automatically variable selection effect.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A vital problem in solving classification or regression problem is to apply
feature engineering and variable selection on data before fed into models.One
of a most popular feature engineering method is to discretisize continous
variable with some cutting points,which is refered to as bining processing.Good
cutting points are important for improving model's ability, because wonderful
bining may ignore some noisy variance in continous variable range and keep
useful leveled information with good ordered encodings.However, to our best
knowledge a majority of cutting point selection is done via researchers domain
knownledge or some naive methods like equal-width cutting or equal-frequency
cutting.In this paper we propose an end-to-end supervised cutting point
selection method based on group and fused lasso along with the automatically
variable selection effect.We name our method \textbf{ABM}(automatic bining
machine). We firstly cut each variable range into fine grid bins and train
model with our group and group fused lasso regularization on each successive
bins.It is a method that integrates feature engineering,variable selection and
model training simultanously.And one more inspiring thing is that the method is
flexible such that it can be taken into a bunch of loss function based model
including deep neural networks.We have also implemented the method in R and
open the source code to other researchers.A Python version will also meet the
community in days.
Related papers
- Learning the Regularization Strength for Deep Fine-Tuning via a Data-Emphasized Variational Objective [4.453137996095194]
grid search is computationally expensive, requires carving out a validation set, and requires practitioners to specify candidate values.
Our proposed technique overcomes all three disadvantages of grid search.
We demonstrate effectiveness on image classification tasks on several datasets, yielding heldout accuracy comparable to existing approaches.
arXiv Detail & Related papers (2024-10-25T16:32:11Z) - Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis [17.989809995141044]
We propose CCA Merge, which is based on Corre Analysis Analysis.
We show that CCA works significantly better than past methods when more than 2 models are merged.
arXiv Detail & Related papers (2024-07-07T14:21:04Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Merging by Matching Models in Task Parameter Subspaces [87.8712523378141]
Model merging aims to cheaply combine individual task-specific models into a single multitask model.
We formalize how this approach to model merging can be seen as solving a linear system of equations.
We show that using the conjugate gradient method can outperform closed-form solutions.
arXiv Detail & Related papers (2023-12-07T14:59:15Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation
Learning [80.45697245527019]
We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection.
We propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert.
arXiv Detail & Related papers (2022-06-27T16:07:27Z) - A Framework and Benchmark for Deep Batch Active Learning for Regression [2.093287944284448]
We study active learning methods that adaptively select batches of unlabeled data for labeling.
We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations, and selection methods.
Our proposed method outperforms the state-of-the-art on our benchmark, scales to large data sets, and works out-of-the-box without adjusting the network architecture or training code.
arXiv Detail & Related papers (2022-03-17T16:11:36Z) - A concise method for feature selection via normalized frequencies [0.0]
In this paper, a concise method is proposed for universal feature selection.
The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them.
The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.
arXiv Detail & Related papers (2021-06-10T15:29:54Z) - Embedded methods for feature selection in neural networks [0.0]
Black box models like neural networks negatively affect the interpretability, generalizability, and the training time of these models.
I propose two integrated approaches for feature selection that can be incorporated directly into the parameter learning.
I benchmarked both the methods against Permutation Feature Importance (PFI) - a general-purpose feature ranking method and a random baseline.
arXiv Detail & Related papers (2020-10-12T16:33:46Z) - Stepwise Model Selection for Sequence Prediction via Deep Kernel
Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions.
We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.