Efficient Reinforced Feature Selection via Early Stopping Traverse
Strategy
- URL: http://arxiv.org/abs/2109.14180v1
- Date: Wed, 29 Sep 2021 03:51:13 GMT
- Title: Efficient Reinforced Feature Selection via Early Stopping Traverse
Strategy
- Authors: Kunpeng Liu, Pengfei Wang, Dongjie Wang, Wan Du, Dapeng Oliver Wu,
Yanjie Fu
- Abstract summary: We propose a single-agent Monte Carlo based reinforced feature selection (MCRFS) method.
We also propose two efficiency improvement strategies, i.e., early stopping (ES) strategy and reward-level interactive (RI) strategy.
- Score: 36.890295071860166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a single-agent Monte Carlo based reinforced feature
selection (MCRFS) method, as well as two efficiency improvement strategies,
i.e., early stopping (ES) strategy and reward-level interactive (RI) strategy.
Feature selection is one of the most important technologies in data
prepossessing, aiming to find the optimal feature subset for a given downstream
machine learning task. Enormous research has been done to improve its
effectiveness and efficiency. Recently, the multi-agent reinforced feature
selection (MARFS) has achieved great success in improving the performance of
feature selection. However, MARFS suffers from the heavy burden of
computational cost, which greatly limits its application in real-world
scenarios. In this paper, we propose an efficient reinforcement feature
selection method, which uses one agent to traverse the whole feature set, and
decides to select or not select each feature one by one. Specifically, we first
develop one behavior policy and use it to traverse the feature set and generate
training data. And then, we evaluate the target policy based on the training
data and improve the target policy by Bellman equation. Besides, we conduct the
importance sampling in an incremental way, and propose an early stopping
strategy to improve the training efficiency by the removal of skew data. In the
early stopping strategy, the behavior policy stops traversing with a
probability inversely proportional to the importance sampling weight. In
addition, we propose a reward-level interactive strategy to improve the
training efficiency via reward-level external advice. Finally, we design
extensive experiments on real-world data to demonstrate the superiority of the
proposed method.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - SMART: Self-learning Meta-strategy Agent for Reasoning Tasks [44.45037694899524]
We introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to learn and select the most effective strategies for various reasoning tasks.
We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven continuous self-improvement.
Our experiments demonstrate that SMART significantly enhances the ability of models to choose optimal strategies without external guidance.
arXiv Detail & Related papers (2024-10-21T15:55:04Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Toward Efficient Automated Feature Engineering [27.47868891738917]
Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks.
Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment.
We construct the AFE pipeline based on reinforcement learning setting, where each feature is assigned an agent to perform feature transformation.
We conduct comprehensive experiments on 36 datasets in terms of both classification and regression tasks.
arXiv Detail & Related papers (2022-12-26T13:18:51Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Deep Reinforcement Learning for Exact Combinatorial Optimization:
Learning to Branch [13.024115985194932]
We propose a new approach for solving the data labeling and inference issues in optimization based on the use of the reinforcement learning (RL) paradigm.
We use imitation learning to bootstrap an RL agent and then use Proximal Policy (PPO) to further explore global optimal actions.
arXiv Detail & Related papers (2022-06-14T16:35:58Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - Active Reinforcement Learning over MDPs [29.59790449462737]
This paper proposes a framework of Active Reinforcement Learning (ARL) over MDPs to improve generalization efficiency in a limited resource by instance selection.
Unlike existing approaches, we attempt to actively select and use training data rather than train on all the given data, thereby costing fewer resources.
arXiv Detail & Related papers (2021-08-05T00:18:11Z) - MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks [37.529217646431825]
In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem.
We develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model.
To improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training.
arXiv Detail & Related papers (2021-05-13T15:07:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.