Related papers: An Evolutionary Framework for Connect-4 as Test-Bed for Comparison of Advanced Minimax, Q-Learning and MCTS

An Evolutionary Framework for Connect-4 as Test-Bed for Comparison of Advanced Minimax, Q-Learning and MCTS

URL: http://arxiv.org/abs/2405.16595v1
Date: Sun, 26 May 2024 15:11:45 GMT
Title: An Evolutionary Framework for Connect-4 as Test-Bed for Comparison of Advanced Minimax, Q-Learning and MCTS
Authors: Henry Taylor, Leonardo Stella,
Abstract summary: This paper develops a novel evolutionary framework to evaluate three classes of algorithms: RL, Minimax and Monte Carlo tree search (MCTS) We show that MCTS achieves the best results in terms of win percentage, whereas Minimax and Q-Learning are ranked in second and third place, respectively.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A major challenge in decision making domains with large state spaces is to effectively select actions which maximize utility. In recent years, approaches such as reinforcement learning (RL) and search algorithms have been successful to tackle this issue, despite their differences. RL defines a learning framework that an agent explores and interacts with. Search algorithms provide a formalism to search for a solution. However, it is often difficult to evaluate the performances of such approaches in a practical way. Motivated by this problem, we focus on one game domain, i.e., Connect-4, and develop a novel evolutionary framework to evaluate three classes of algorithms: RL, Minimax and Monte Carlo tree search (MCTS). The contribution of this paper is threefold: i) we implement advanced versions of these algorithms and provide a systematic comparison with their standard counterpart, ii) we develop a novel evaluation framework, which we call the Evolutionary Tournament, and iii) we conduct an extensive evaluation of the relative performance of each algorithm to compare our findings. We evaluate different metrics and show that MCTS achieves the best results in terms of win percentage, whereas Minimax and Q-Learning are ranked in second and third place, respectively, although the latter is shown to be the fastest to make a decision.

Related papers

A Comprehensive Survey of Reinforcement Learning: From Algorithms to Practical Challenges [2.2448567386846916]
Reinforcement Learning (RL) has emerged as a powerful paradigm in Artificial Intelligence (AI) This paper presents a comprehensive survey of RL, meticulously analyzing a wide range of algorithms. We offer practical insights into the selection and implementation of RL algorithms, addressing common challenges like convergence, stability, and the exploration-exploitation dilemma.
arXiv Detail & Related papers (2024-11-28T03:53:14Z)
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research. We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z)
MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search [27.807695570974644]
We propose a general training framework, MARL-LNS, to address issues by training on alternating subsets of agents. We show that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.
arXiv Detail & Related papers (2024-04-03T22:51:54Z)
LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios [32.83545787965431]
Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. It has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios.
arXiv Detail & Related papers (2023-10-12T14:18:09Z)
A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms [11.264467955516706]
We propose four approaches to assessing the difficulty and appropriateness of 13 established datasets. We show that most of the popular datasets pose rather easy classification tasks. We propose a new methodology for yielding benchmark datasets.
arXiv Detail & Related papers (2023-07-03T07:54:54Z)
Improving and Benchmarking Offline Reinforcement Learning Algorithms [87.67996706673674]
This work aims to bridge the gaps caused by low-level choices and datasets. We empirically investigate 20 implementation choices using three representative algorithms. We find two variants CRR+ and CQL+ achieving new state-of-the-art on D4RL.
arXiv Detail & Related papers (2023-06-01T17:58:46Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z)
On the Convergence of Distributed Stochastic Bilevel Optimization Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models. Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data. We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z)
Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching. We propose retro branching; a simple yet effective approach to RL for branching. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z)
Online Baum-Welch algorithm for Hierarchical Imitation Learning [7.271970309320002]
We propose an online algorithm to perform hierarchical imitation learning in the options framework. We show that this approach works well in both discrete and continuous environments.
arXiv Detail & Related papers (2021-03-22T22:03:25Z)
Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search [8.22379888383833]
Combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. We describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards, 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search.
arXiv Detail & Related papers (2020-05-22T18:02:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.