Costly Features Classification using Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2102.07073v1
- Date: Sun, 14 Feb 2021 05:18:33 GMT
- Title: Costly Features Classification using Monte Carlo Tree Search
- Authors: Ziheng Chen, Jin Huang, Hongshik Ahn, Xin Ning
- Abstract summary: We consider the problem of costly feature classification, where we sequentially select the subset of features to make a balance between the classification error and the feature cost.
In this paper, we first cast the task into a MDP problem and use Advantage Actor Critic algorithm to solve it.
- Score: 5.188762991286163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of costly feature classification, where we
sequentially select the subset of features to make a balance between the
classification error and the feature cost. In this paper, we first cast the
task into a MDP problem and use Advantage Actor Critic algorithm to solve it.
In order to further improve the agent's performance and make the policy
explainable, we employ the Monte Carlo Tree Search to update the policy
iteratively. During the procedure, we also consider its performance on the
unbalanced dataset and its sensitivity to the missing value. We evaluate our
model on multiple datasets and find it outperforms other methods.
Related papers
- Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Feature Acquisition using Monte Carlo Tree Search [18.76745359031975]
Feature acquisition algorithms address the problem of acquiring informative features while balancing the costs of acquisition to improve the learning performances of ML models.
Previous approaches have focused on calculating the expected utility values of features to determine the acquisition sequences.
In comparison to previous approaches, we focus on 1) formulating the feature acquisition problem as a MDP and applying Monte Carlo Tree Search, 2) calculating the intermediary rewards for each acquisition step based on model improvements and acquisition costs, and 3) simultaneously optimizing model improvement and acquisition costs with multi-objective Monte Carlo Tree Search.
arXiv Detail & Related papers (2022-12-21T20:53:44Z) - Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them.
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Solve Traveling Salesman Problem by Monte Carlo Tree Search and Deep
Neural Network [8.19063619210761]
We present a self-learning approach that combines deep reinforcement learning and Monte Carlo tree search to solve the traveling salesman problem.
Experimental results show that the proposed method performs favorably against other methods in small-to-medium problem settings.
It shows comparable performance as state-of-the-art in large problem setting.
arXiv Detail & Related papers (2020-05-14T11:36:40Z) - Efficient Rollout Strategies for Bayesian Optimization [15.050692645517998]
Most acquisition functions are myopic, meaning that they only consider the impact of the next function evaluation.
We show that a combination of quasi-Monte Carlo, common random numbers, and control variables significantly reduce the computational burden of rollout.
We then formulate a policy-search based approach that removes the need to optimize the rollout acquisition function.
arXiv Detail & Related papers (2020-02-24T20:54:08Z) - Monotonic Cardinality Estimation of Similarity Selection: A Deep
Learning Approach [22.958342743597044]
We investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection.
We propose a novel and generic method that can be applied to any data type and distance function.
arXiv Detail & Related papers (2020-02-15T20:22:51Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.