Related papers: Epistemic Monte Carlo Tree Search

Epistemic Monte Carlo Tree Search

URL: http://arxiv.org/abs/2210.13455v4
Date: Fri, 04 Oct 2024 13:59:30 GMT
Title: Epistemic Monte Carlo Tree Search
Authors: Yaniv Oren, Villiam Vadocz, Matthijs T. J. Spaan, Wendelin Böhmer,
Abstract summary: We introduce Epistemic MCTS (EMCTS) to account for the uncertainty in search and harness the search for deep exploration. In the challenging sparse-reward task of writing code in the Assembly language SUBLEQ, AZ paired with our method achieves significantly higher sample efficiency over baseline AZ.
Score: 5.624791703748109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The AlphaZero/MuZero (A/MZ) family of algorithms has achieved remarkable success across various challenging domains by integrating Monte Carlo Tree Search (MCTS) with learned models. Learned models introduce epistemic uncertainty, which is caused by learning from limited data and is useful for exploration in sparse reward environments. MCTS does not account for the propagation of this uncertainty however. To address this, we introduce Epistemic MCTS (EMCTS): a theoretically motivated approach to account for the epistemic uncertainty in search and harness the search for deep exploration. In the challenging sparse-reward task of writing code in the Assembly language SUBLEQ, AZ paired with our method achieves significantly higher sample efficiency over baseline AZ. Search with EMCTS solves variations of the commonly used hard-exploration benchmark Deep Sea - which baseline A/MZ are practically unable to solve - much faster than an otherwise equivalent method that does not use search for uncertainty estimation, demonstrating significant benefits from search for epistemic uncertainty estimation.

Related papers

ACMamba: Fast Unsupervised Anomaly Detection via An Asymmetrical Consensus State Space Model [51.83639270669481]
Unsupervised anomaly detection in hyperspectral images (HSI) aims to detect unknown targets from backgrounds. HSI studies are hindered by steep computational costs due to the high-dimensional property of HSI and dense sampling-based training paradigm. We propose an Asymmetrical Consensus State Space Model (ACMamba) to significantly reduce computational costs without compromising accuracy.
arXiv Detail & Related papers (2025-04-16T05:33:42Z)
Multi-LLM Collaborative Search for Complex Problem Solving [54.194370845153784]
We propose the Mixture-of-Search-Agents (MoSA) paradigm to enhance search-based reasoning. MoSA integrates diverse reasoning pathways by combining independent exploration with iterative refinement among LLMs. Using Monte Carlo Tree Search (MCTS) as a backbone, MoSA enables multiple agents to propose and aggregate reasoning steps, resulting in improved accuracy.
arXiv Detail & Related papers (2025-02-26T06:31:04Z)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z)
Amplifying Exploration in Monte-Carlo Tree Search by Focusing on the Unknown [19.664506834858244]
Monte-Carlo tree search (MCTS) strategically allocates computational resources to focus on promising segments of the search tree. Our proposed methodology, denoted as AmEx-MCTS, solves this problem by introducing a novel MCTS formulation. Our empirical evaluation demonstrates the superior performance of AmEx-MCTS, surpassing classical MCTS and related approaches by a substantial margin.
arXiv Detail & Related papers (2024-02-13T15:05:54Z)
Evaluating tree-based imputation methods as an alternative to MICE PMM for drawing inference in empirical studies [0.5892638927736115]
Dealing with missing data is an important problem in statistical analysis that is often addressed with imputation procedures. The prevailing method of Multiple Imputation by Chained Equations with Predictive Mean Matching (PMM) is considered standard in the social science literature. In particular, tree-based imputation methods have emerged as very competitive approaches.
arXiv Detail & Related papers (2024-01-17T21:28:00Z)
Exploration via Epistemic Value Estimation [22.54793586116019]
We propose a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.
arXiv Detail & Related papers (2023-03-07T16:25:52Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
Uncertainty in Extreme Multi-label Classification [81.14232824864787]
eXtreme Multi-label Classification (XMC) is an essential task in the era of big data for web-scale machine learning applications. In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework. In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions.
arXiv Detail & Related papers (2022-10-18T20:54:33Z)
Learning Hidden Markov Models When the Locations of Missing Observations are Unknown [54.40592050737724]
We consider the general problem of learning an HMM from data with unknown missing observation locations. We provide reconstruction algorithms that do not require any assumptions about the structure of the underlying chain. We show that under proper specifications one can reconstruct the process dynamics as well as if the missing observations positions were known.
arXiv Detail & Related papers (2022-03-12T22:40:43Z)
DeepPAMM: Deep Piecewise Exponential Additive Mixed Models for Complex Hazard Structures in Survival Analysis [0.7349727826230864]
Survival analysis (SA) is an active field of research that is concerned with time-to-event outcomes. Despite its importance, SA remains challenging due to small-scale data sets and complex outcome distributions. We propose DeepPAMM, a versatile deep learning framework that is well-founded from a statistical point of view, yet with enough flexibility for modeling complex hazard structures.
arXiv Detail & Related papers (2022-02-12T11:38:57Z)
Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs) We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)
Unlucky Explorer: A Complete non-Overlapping Map Exploration [0.949996206597248]
We introduce the Maze Dash puzzle as an exploration problem where the agent must find a Hamiltonian Path visiting all the cells. An optimization has been applied to the proposed Monte-Carlo Tree Search (MCTS) algorithm to obtain a promising result. Our comparison indicates that the MCTS-based approach is an up-and-coming method that could cope with the test cases with small and medium sizes.
arXiv Detail & Related papers (2020-05-28T17:19:24Z)
Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding Meta-Amortization Error [50.83356836818667]
We develop a novel meta-regularization objective using it cyclical annealing schedule and it maximum mean discrepancy (MMD) criterion. The experimental results show that our approach substantially outperforms standard meta-learning algorithms.
arXiv Detail & Related papers (2020-03-04T04:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.