Related papers: AEFE: Automatic Embedded Feature Engineering for Categorical Features

AEFE: Automatic Embedded Feature Engineering for Categorical Features

URL: http://arxiv.org/abs/2110.09770v1
Date: Tue, 19 Oct 2021 07:22:59 GMT
Title: AEFE: Automatic Embedded Feature Engineering for Categorical Features
Authors: Zhenyuan Zhong, Jie Yang, Yacong Ma, Shoubin Dong, Jinlong Hu
Abstract summary: We propose an automatic feature engineering framework for representing categorical features, which consists of various components including custom paradigm feature construction and multiple feature selection. Experiments conducted on some typical e-commerce datasets indicate that our method outperforms the classical machine learning models and state-of-the-art deep learning models.
Score: 4.310748698480341
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The challenge of solving data mining problems in e-commerce applications such as recommendation system (RS) and click-through rate (CTR) prediction is how to make inferences by constructing combinatorial features from a large number of categorical features while preserving the interpretability of the method. In this paper, we propose Automatic Embedded Feature Engineering(AEFE), an automatic feature engineering framework for representing categorical features, which consists of various components including custom paradigm feature construction and multiple feature selection. By selecting the potential field pairs intelligently and generating a series of interpretable combinatorial features, our framework can provide a set of unseen generated features for enhancing model performance and then assist data analysts in discovering the feature importance for particular data mining tasks. Furthermore, AEFE is distributed implemented by task-parallelism, data sampling, and searching schema based on Matrix Factorization field combination, to optimize the performance and enhance the efficiency and scalability of the framework. Experiments conducted on some typical e-commerce datasets indicate that our method outperforms the classical machine learning models and state-of-the-art deep learning models.

Related papers

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z)
"FRAME: Forward Recursive Adaptive Model Extraction-A Technique for Advance Feature Selection" [0.0]
This study introduces a novel hybrid approach, the Forward Recursive Adaptive Model Extraction Technique (FRAME) FRAME combines Forward Selection and Recursive Feature Elimination to enhance feature selection across diverse datasets. The results demonstrate that FRAME consistently delivers superior predictive performance based on downstream machine learning evaluation metrics.
arXiv Detail & Related papers (2025-01-21T08:34:10Z)
Retrieval-Augmented Instruction Tuning for Automated Process Engineering Calculations : A Tool-Chaining Problem-Solving Framework with Attributable Reflection [0.0]
We introduce a novel autonomous agent framework leveraging Retrieval-Augmented Instruction-Tuning (RAIT) to enhance open, customizable small code language models (SLMs) By combining instruction tuned code SLMs with Retrieval-Augmented Code Generation (RACG) using external tools, the agent generates, debugs, and optimize code from natural language specifications. Our approach addresses the limitations of the current lack of a foundational AI model for specialized process engineering tasks and offers benefits of explainability, knowledge editing, and cost-effectiveness.
arXiv Detail & Related papers (2024-08-28T15:33:47Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
CELA: Cost-Efficient Language Model Alignment for CTR Prediction [70.65910069412944]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.<n>Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)<n>We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z)
Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction. We reformulate the task to be entity-centric, enabling the use of diverse metrics. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z)
LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z)
DoE2Vec: Deep-learning Based Features for Exploratory Landscape Analysis [0.0]
We propose DoE2Vec, a variational autoencoder (VAE)-based methodology to learn optimization landscape characteristics. Unlike the classical exploratory landscape analysis (ELA) method, our approach does not require any feature engineering. For validation, we inspect the quality of latent reconstructions and analyze the latent representations using different experiments.
arXiv Detail & Related papers (2023-03-31T09:38:44Z)
Feature construction using explanations of individual predictions [0.0]
We propose a novel approach for reducing the search space based on aggregation of instance-based explanations of predictive models. We empirically show that reducing the search to these groups significantly reduces the time of feature construction. We show significant improvements in classification accuracy for several classifiers and demonstrate the feasibility of the proposed feature construction even for large datasets.
arXiv Detail & Related papers (2023-01-23T18:59:01Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Efficient Data-specific Model Search for Collaborative Filtering [56.60519991956558]
Collaborative filtering (CF) is a fundamental approach for recommender systems. In this paper, motivated by the recent advances in automated machine learning (AutoML), we propose to design a data-specific CF model. Key here is a new framework that unifies state-of-the-art (SOTA) CF methods and splits them into disjoint stages of input encoding, embedding function, interaction and prediction function.
arXiv Detail & Related papers (2021-06-14T14:30:32Z)
AutoDis: Automatic Discretization for Embedding Numerical Features in CTR Prediction [45.69943728028556]
Learning sophisticated feature interactions is crucial for Click-Through Rate (CTR) prediction in recommender systems. Various deep CTR models follow an Embedding & Feature Interaction paradigm. We propose AutoDis, a framework that discretizes features in numerical fields automatically and is optimized with CTR models in an end-to-end manner.
arXiv Detail & Related papers (2020-12-16T14:31:31Z)
Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction [64.03526633651218]
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems. We propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR.
arXiv Detail & Related papers (2020-06-29T04:33:01Z)
StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics [4.237343083490243]
In machine learning (ML), ensemble methods such as bagging, boosting, and stacking are widely-established approaches. StackGenVis is a visual analytics system for stacked generalization.
arXiv Detail & Related papers (2020-05-04T15:43:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.