ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems
- URL: http://arxiv.org/abs/2403.12660v3
- Date: Wed, 19 Jun 2024 12:48:25 GMT
- Title: ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems
- Authors: Pengyue Jia, Yejing Wang, Zhaocheng Du, Xiangyu Zhao, Yichao Wang, Bo Chen, Wanyu Wang, Huifeng Guo, Ruiming Tang,
- Abstract summary: This paper presents ERASE, a comprehensive bEnchmaRk for feAture SElection for Deep Recommender Systems (DRS)
ERASE comprises a thorough evaluation of eleven feature selection methods, covering both traditional and deep learning approaches.
Our code is available online for ease of reproduction.
- Score: 40.838320650137625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core challenges. Firstly, variant experimental setups across research papers often yield unfair comparisons, obscuring practical insights. Secondly, the existing literature's lack of detailed analysis on selection attributes, based on large-scale datasets and a thorough comparison among selection techniques and DRS backbones, restricts the generalizability of findings and impedes deployment on DRS. Lastly, research often focuses on comparing the peak performance achievable by feature selection methods, an approach that is typically computationally infeasible for identifying the optimal hyperparameters and overlooks evaluating the robustness and stability of these methods. To bridge these gaps, this paper presents ERASE, a comprehensive bEnchmaRk for feAture SElection for DRS. ERASE comprises a thorough evaluation of eleven feature selection methods, covering both traditional and deep learning approaches, across four public datasets, private industrial datasets, and a real-world commercial platform, achieving significant enhancement. Our code is available online for ease of reproduction.
Related papers
- "FRAME: Forward Recursive Adaptive Model Extraction -- A Technique for Advance Feature Selection" [0.0]
This study introduces a novel hybrid approach, the Forward Recursive Adaptive Model Extraction Technique (FRAME)
FRAME combines Forward Selection and Recursive Feature Elimination to enhance feature selection across diverse datasets.
The results demonstrate that FRAME consistently delivers superior predictive performance based on downstream machine learning evaluation metrics.
arXiv Detail & Related papers (2025-01-21T08:34:10Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.
With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)
Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.
High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities [55.87169702896249]
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift.
We present a complete and fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment.
Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications.
arXiv Detail & Related papers (2024-07-16T12:52:29Z) - A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.
Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.
This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Towards the D-Optimal Online Experiment Design for Recommender Selection [18.204325860752768]
Finding the optimal online experiment is nontrivial since both the users and displayed recommendations carry contextual features that are informative to the reward.
We leverage the emphD-optimal design from the classical statistics literature to achieve the maximum information gain during exploration.
We then use our deployment example on Walmart.com to fully illustrate the practical insights and effectiveness of the proposed methods.
arXiv Detail & Related papers (2021-10-23T04:30:27Z) - Automated Human Activity Recognition by Colliding Bodies
Optimization-based Optimal Feature Selection with Recurrent Neural Network [0.0]
Human Activity Recognition (HAR) is considered to be an efficient model in pervasive computation from sensor readings.
This paper tempts to implement the HAR system using deep learning with the data collected from smart sensors that are publicly available in the UC Irvine Machine Learning Repository (UCI)
arXiv Detail & Related papers (2020-10-07T10:58:46Z) - Robust Active Preference Elicitation [10.961537256186498]
We study the problem of eliciting the preferences of a decision-maker through a moderate number of pairwise comparison queries.
We are motivated by applications in high stakes domains, such as when choosing a policy for allocating scarce resources.
arXiv Detail & Related papers (2020-03-04T05:24:08Z) - Outlier Detection Ensemble with Embedded Feature Selection [42.8338013000469]
We propose an outlier detection ensemble framework with embedded feature selection (ODEFS)
For each random sub-sampling based learning component, ODEFS unifies feature selection and outlier detection into a pairwise ranking formulation.
We adopt the thresholded self-paced learning to simultaneously optimize feature selection and example selection.
arXiv Detail & Related papers (2020-01-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.