Can Large Language Models Play Games? A Case Study of A Self-Play
Approach
- URL: http://arxiv.org/abs/2403.05632v1
- Date: Fri, 8 Mar 2024 19:16:29 GMT
- Title: Can Large Language Models Play Games? A Case Study of A Self-Play
Approach
- Authors: Hongyi Guo, Zhihan Liu, Yufeng Zhang, Zhaoran Wang
- Abstract summary: Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge.
Monte-Carlo Tree Search (MCTS) is a search algorithm that provides reliable decision-making solutions.
This work introduces an innovative approach that bolsters LLMs with MCTS self-play to efficiently resolve turn-based zero-sum games.
- Score: 61.15761840203145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) harness extensive data from the Internet,
storing a broad spectrum of prior knowledge. While LLMs have proven beneficial
as decision-making aids, their reliability is hampered by limitations in
reasoning, hallucination phenomenon, and so on. On the other hand, Monte-Carlo
Tree Search (MCTS) is a heuristic search algorithm that provides reliable
decision-making solutions, achieved through recursive rollouts and self-play.
However, the effectiveness of MCTS relies heavily on heuristic pruning and
external value functions, particularly in complex decision scenarios. This work
introduces an innovative approach that bolsters LLMs with MCTS self-play to
efficiently resolve deterministic turn-based zero-sum games (DTZG), such as
chess and go, without the need for additional training. Specifically, we
utilize LLMs as both action pruners and proxies for value functions without the
need for additional training. We theoretically prove that the suboptimality of
the estimated value in our proposed method scales with $\tilde{\mathcal
O}\Bigl(\frac{|\tilde {\mathcal A}|}{\sqrt{N}} + \epsilon_\mathrm{pruner} +
\epsilon_\mathrm{critic}\Bigr)$, where \(N\) is the number of simulations,
$|\tilde {\mathcal A}|$ is the cardinality of the pruned action space by LLM,
and $\epsilon_\mathrm{pruner}$ and $\epsilon_\mathrm{critic}$ quantify the
errors incurred by adopting LLMs as action space pruner and value function
proxy, respectively. Our experiments in chess and go demonstrate the capability
of our method to address challenges beyond the scope of MCTS and improve the
performance of the directly application of LLMs.
Related papers
- Towards Efficient Automatic Self-Pruning of Large Language Models [55.90119819642064]
Post-training structured pruning is a promising solution that prunes Large Language Models without the need for retraining.
We argue that the key to mitigating this issue lies in accurately determining the pruning rate for each layer.
We introduce $textbfSelf-Pruner$ an end-to-end automatic self-pruning framework for LLMs, which efficiently search layer-wise pruning rates.
arXiv Detail & Related papers (2025-02-20T09:59:50Z) - PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models.
We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z) - zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning.
We propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs.
arXiv Detail & Related papers (2024-09-23T01:03:15Z) - Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher [11.136112399898481]
How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality?
We develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens.
We demonstrate that our method provides a consistent improvement over conventional decoding strategies.
arXiv Detail & Related papers (2024-06-26T01:16:12Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.