AEFE: Automatic Embedded Feature Engineering for Categorical Features
- URL: http://arxiv.org/abs/2110.09770v1
- Date: Tue, 19 Oct 2021 07:22:59 GMT
- Title: AEFE: Automatic Embedded Feature Engineering for Categorical Features
- Authors: Zhenyuan Zhong, Jie Yang, Yacong Ma, Shoubin Dong, Jinlong Hu
- Abstract summary: We propose an automatic feature engineering framework for representing categorical features, which consists of various components including custom paradigm feature construction and multiple feature selection.
Experiments conducted on some typical e-commerce datasets indicate that our method outperforms the classical machine learning models and state-of-the-art deep learning models.
- Score: 4.310748698480341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The challenge of solving data mining problems in e-commerce applications such
as recommendation system (RS) and click-through rate (CTR) prediction is how to
make inferences by constructing combinatorial features from a large number of
categorical features while preserving the interpretability of the method. In
this paper, we propose Automatic Embedded Feature Engineering(AEFE), an
automatic feature engineering framework for representing categorical features,
which consists of various components including custom paradigm feature
construction and multiple feature selection. By selecting the potential field
pairs intelligently and generating a series of interpretable combinatorial
features, our framework can provide a set of unseen generated features for
enhancing model performance and then assist data analysts in discovering the
feature importance for particular data mining tasks. Furthermore, AEFE is
distributed implemented by task-parallelism, data sampling, and searching
schema based on Matrix Factorization field combination, to optimize the
performance and enhance the efficiency and scalability of the framework.
Experiments conducted on some typical e-commerce datasets indicate that our
method outperforms the classical machine learning models and state-of-the-art
deep learning models.
Related papers
- Retrieval-Augmented Instruction Tuning for Automated Process Engineering Calculations : A Tool-Chaining Problem-Solving Framework with Attributable Reflection [0.0]
We introduce a novel autonomous agent framework leveraging Retrieval-Augmented Instruction-Tuning (RAIT) to enhance open, customizable small code language models (SLMs)
By combining instruction tuned code SLMs with Retrieval-Augmented Code Generation (RACG) using external tools, the agent generates, debugs, and optimize code from natural language specifications.
Our approach addresses the limitations of the current lack of a foundational AI model for specialized process engineering tasks and offers benefits of explainability, knowledge editing, and cost-effectiveness.
arXiv Detail & Related papers (2024-08-28T15:33:47Z) - GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models.
GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies.
We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - DoE2Vec: Deep-learning Based Features for Exploratory Landscape Analysis [0.0]
We propose DoE2Vec, a variational autoencoder (VAE)-based methodology to learn optimization landscape characteristics.
Unlike the classical exploratory landscape analysis (ELA) method, our approach does not require any feature engineering.
For validation, we inspect the quality of latent reconstructions and analyze the latent representations using different experiments.
arXiv Detail & Related papers (2023-03-31T09:38:44Z) - Feature construction using explanations of individual predictions [0.0]
We propose a novel approach for reducing the search space based on aggregation of instance-based explanations of predictive models.
We empirically show that reducing the search to these groups significantly reduces the time of feature construction.
We show significant improvements in classification accuracy for several classifiers and demonstrate the feasibility of the proposed feature construction even for large datasets.
arXiv Detail & Related papers (2023-01-23T18:59:01Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Efficient Data-specific Model Search for Collaborative Filtering [56.60519991956558]
Collaborative filtering (CF) is a fundamental approach for recommender systems.
In this paper, motivated by the recent advances in automated machine learning (AutoML), we propose to design a data-specific CF model.
Key here is a new framework that unifies state-of-the-art (SOTA) CF methods and splits them into disjoint stages of input encoding, embedding function, interaction and prediction function.
arXiv Detail & Related papers (2021-06-14T14:30:32Z) - AutoDis: Automatic Discretization for Embedding Numerical Features in
CTR Prediction [45.69943728028556]
Learning sophisticated feature interactions is crucial for Click-Through Rate (CTR) prediction in recommender systems.
Various deep CTR models follow an Embedding & Feature Interaction paradigm.
We propose AutoDis, a framework that discretizes features in numerical fields automatically and is optimized with CTR models in an end-to-end manner.
arXiv Detail & Related papers (2020-12-16T14:31:31Z) - Towards Automated Neural Interaction Discovery for Click-Through Rate
Prediction [64.03526633651218]
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems.
We propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR.
arXiv Detail & Related papers (2020-06-29T04:33:01Z) - StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics [4.237343083490243]
In machine learning (ML), ensemble methods such as bagging, boosting, and stacking are widely-established approaches.
StackGenVis is a visual analytics system for stacked generalization.
arXiv Detail & Related papers (2020-05-04T15:43:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.