Related papers: Efficient Reinforced Feature Selection via Early Stopping Traverse Strategy

Efficient Reinforced Feature Selection via Early Stopping Traverse Strategy

URL: http://arxiv.org/abs/2109.14180v1
Date: Wed, 29 Sep 2021 03:51:13 GMT
Title: Efficient Reinforced Feature Selection via Early Stopping Traverse Strategy
Authors: Kunpeng Liu, Pengfei Wang, Dongjie Wang, Wan Du, Dapeng Oliver Wu, Yanjie Fu
Abstract summary: We propose a single-agent Monte Carlo based reinforced feature selection (MCRFS) method. We also propose two efficiency improvement strategies, i.e., early stopping (ES) strategy and reward-level interactive (RI) strategy.
Score: 36.890295071860166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a single-agent Monte Carlo based reinforced feature selection (MCRFS) method, as well as two efficiency improvement strategies, i.e., early stopping (ES) strategy and reward-level interactive (RI) strategy. Feature selection is one of the most important technologies in data prepossessing, aiming to find the optimal feature subset for a given downstream machine learning task. Enormous research has been done to improve its effectiveness and efficiency. Recently, the multi-agent reinforced feature selection (MARFS) has achieved great success in improving the performance of feature selection. However, MARFS suffers from the heavy burden of computational cost, which greatly limits its application in real-world scenarios. In this paper, we propose an efficient reinforcement feature selection method, which uses one agent to traverse the whole feature set, and decides to select or not select each feature one by one. Specifically, we first develop one behavior policy and use it to traverse the feature set and generate training data. And then, we evaluate the target policy based on the training data and improve the target policy by Bellman equation. Besides, we conduct the importance sampling in an incremental way, and propose an early stopping strategy to improve the training efficiency by the removal of skew data. In the early stopping strategy, the behavior policy stops traversing with a probability inversely proportional to the importance sampling weight. In addition, we propose a reward-level interactive strategy to improve the training efficiency via reward-level external advice. Finally, we design extensive experiments on real-world data to demonstrate the superiority of the proposed method.

Related papers

Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective [31.956232187102465]
This paper studies how to transfer knowledge from those imperfect reward models in online RLHF. We propose novel transfer learning principles and a theoretical algorithm with provable benefits compared to standard online learning.
arXiv Detail & Related papers (2025-02-26T16:03:06Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks [44.45037694899524]
We introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to learn and select the most effective strategies for various reasoning tasks. We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven continuous self-improvement. Our experiments demonstrate that SMART significantly enhances the ability of models to choose optimal strategies without external guidance.
arXiv Detail & Related papers (2024-10-21T15:55:04Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Toward Efficient Automated Feature Engineering [27.47868891738917]
Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks. Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment. We construct the AFE pipeline based on reinforcement learning setting, where each feature is assigned an agent to perform feature transformation. We conduct comprehensive experiments on 36 datasets in terms of both classification and regression tasks.
arXiv Detail & Related papers (2022-12-26T13:18:51Z)
Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems. Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success. We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z)
Deep Reinforcement Learning for Exact Combinatorial Optimization: Learning to Branch [13.024115985194932]
We propose a new approach for solving the data labeling and inference issues in optimization based on the use of the reinforcement learning (RL) paradigm. We use imitation learning to bootstrap an RL agent and then use Proximal Policy (PPO) to further explore global optimal actions.
arXiv Detail & Related papers (2022-06-14T16:35:58Z)
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs) Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z)
APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized. The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z)
Active Reinforcement Learning over MDPs [29.59790449462737]
This paper proposes a framework of Active Reinforcement Learning (ARL) over MDPs to improve generalization efficiency in a limited resource by instance selection. Unlike existing approaches, we attempt to actively select and use training data rather than train on all the given data, thereby costing fewer resources.
arXiv Detail & Related papers (2021-08-05T00:18:11Z)
MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks [37.529217646431825]
In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem. We develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model. To improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training.
arXiv Detail & Related papers (2021-05-13T15:07:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.