Evolutionary Optimization of Deep Learning Agents for Sparrow Mahjong
- URL: http://arxiv.org/abs/2508.07522v1
- Date: Mon, 11 Aug 2025 00:53:52 GMT
- Title: Evolutionary Optimization of Deep Learning Agents for Sparrow Mahjong
- Authors: Jim O'Connor, Derin Gezgin, Gary B. Parker,
- Abstract summary: We present Evo-Sparrow, a deep learning-based agent for AI decision-making in Sparrow Mahjong.<n>Our model evaluates board states and optimize decision policies in a non-deterministic, partially observable game environment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Evo-Sparrow, a deep learning-based agent for AI decision-making in Sparrow Mahjong, trained by optimizing Long Short-Term Memory (LSTM) networks using Covariance Matrix Adaptation Evolution Strategy (CMA-ES). Our model evaluates board states and optimizes decision policies in a non-deterministic, partially observable game environment. Empirical analysis conducted over a significant number of simulations demonstrates that our model outperforms both random and rule-based agents, and achieves performance comparable to a Proximal Policy Optimization (PPO) baseline, indicating strong strategic play and robust policy quality. By combining deep learning with evolutionary optimization, our approach provides a computationally effective alternative to traditional reinforcement learning and gradient-based optimization methods. This research contributes to the broader field of AI game playing, demonstrating the viability of hybrid learning strategies for complex stochastic games. These findings also offer potential applications in adaptive decision-making and strategic AI development beyond Sparrow Mahjong.
Related papers
- Discovering Multiagent Learning Algorithms with Large Language Models [8.649235365712004]
We propose the use of AlphaEvolve, an evolutionary coding agent powered by large language models, to automatically discover new multiagent learning algorithms.<n>We demonstrate the generality of this framework by evolving novel variants for two distinct paradigms of game-theoretic learning.
arXiv Detail & Related papers (2026-02-18T22:41:00Z) - Detect and Act: Automated Dynamic Optimizer through Meta-Black-Box Optimization [19.31451943915537]
We propose a reinforcement learning-assisted approach to enable automated variation detection and self-adaption in evolutionary algorithms.<n>Our approach could generalize toward unseen DOPs with automated environment variation detection and self-adaption.
arXiv Detail & Related papers (2026-01-30T04:28:27Z) - Scaling and Transferability of Annealing Strategies in Large Language Model Training [59.443651879173025]
We refine a predictive framework for optimizing annealing strategies under the Warmup-Steady-Decay (WSD) scheduler.<n>Our improved framework incorporates training steps, maximum learning rate, and annealing behavior, enabling more efficient optimization of learning rate schedules.<n>We validate our findings on extensive experiments using both Dense and Mixture-of-Experts (MoE) models.
arXiv Detail & Related papers (2025-12-05T16:38:33Z) - Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts [63.412646471177645]
We propose a novel Reinforced Strategy Optimization (RSO) method for Conversational Recommender Systems (CRSs)<n>RSO decomposes the process of generating strategy-driven response decisions into the macro-level strategy planning and micro-level strategy adaptation.<n>Experiments show that RSO significantly improves interaction performance compared to state-of-the-art baselines.
arXiv Detail & Related papers (2025-09-30T11:12:01Z) - Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - PAIR: A Novel Large Language Model-Guided Selection Strategy for Evolutionary Algorithms [2.3244035825657963]
This paper introduces Preference-Aligned Individual Reciprocity (PAIR)<n>PAIR emulates human-like mate selection, thereby introducing intelligence to the pairing process in Evolutionary Algorithms (EAs)
arXiv Detail & Related papers (2025-03-05T07:45:56Z) - Preference-based opponent shaping in differentiable games [3.373994463906893]
We propose a novel Preference-based Opponent Shaping (PBOS) method to enhance the strategy learning process by shaping agents' preferences towards cooperation.<n>We verify the performance of PBOS algorithm in a variety of differentiable games.
arXiv Detail & Related papers (2024-12-04T06:49:21Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning [8.389454219309837]
multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations.
We propose RLEMMO, a Meta-Black-Box Optimization framework, which maintains a population of solutions and incorporates a reinforcement learning agent.
With a novel reward mechanism that encourages both quality and diversity, RLEMMO can be effectively trained using a policy gradient algorithm.
arXiv Detail & Related papers (2024-04-12T05:02:49Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - A reinforcement learning strategy for p-adaptation in high order solvers [0.0]
Reinforcement learning (RL) has emerged as a promising approach to automating decision processes.
This paper explores the application of RL techniques to optimise the order in the computational mesh when using high-order solvers.
arXiv Detail & Related papers (2023-06-14T07:01:31Z) - Deep Reinforcement Learning for Exact Combinatorial Optimization:
Learning to Branch [13.024115985194932]
We propose a new approach for solving the data labeling and inference issues in optimization based on the use of the reinforcement learning (RL) paradigm.
We use imitation learning to bootstrap an RL agent and then use Proximal Policy (PPO) to further explore global optimal actions.
arXiv Detail & Related papers (2022-06-14T16:35:58Z) - Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise
Rollouts [52.844741540236285]
This paper investigates the model-based methods in multi-agent reinforcement learning (MARL)
We propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy (AORPO)
arXiv Detail & Related papers (2021-05-07T16:20:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.