Related papers: Improving Bidding and Playing Strategies in the Trick-Taking game Wizard using Deep Q-Networks

Improving Bidding and Playing Strategies in the Trick-Taking game Wizard using Deep Q-Networks

URL: http://arxiv.org/abs/2205.13834v1
Date: Fri, 27 May 2022 08:59:42 GMT
Title: Improving Bidding and Playing Strategies in the Trick-Taking game Wizard using Deep Q-Networks
Authors: Jonas Schumacher, Marco Pleines
Abstract summary: The trick-taking game Wizard with a separate bidding and playing phase is modeled by two interleaved partially observable Markov decision processes (POMDP) Deep Q-Networks (DQN) are used to empower self-improving agents, which are capable of tackling the challenges of a highly non-stationary environment. The trained DQN agents achieve accuracies between 66% and 87% in self-play, leaving behind both a random baseline and a rule-based asymmetry.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, the trick-taking game Wizard with a separate bidding and playing phase is modeled by two interleaved partially observable Markov decision processes (POMDP). Deep Q-Networks (DQN) are used to empower self-improving agents, which are capable of tackling the challenges of a highly non-stationary environment. To compare algorithms between each other, the accuracy between bid and trick count is monitored, which strongly correlates with the actual rewards and provides a well-defined upper and lower performance bound. The trained DQN agents achieve accuracies between 66% and 87% in self-play, leaving behind both a random baseline and a rule-based heuristic. The conducted analysis also reveals a strong information asymmetry concerning player positions during bidding. To overcome the missing Markov property of imperfect-information games, a long short-term memory (LSTM) network is implemented to integrate historic information into the decision-making process. Additionally, a forward-directed tree search is conducted by sampling a state of the environment and thereby turning the game into a perfect information setting. To our surprise, both approaches do not surpass the performance of the basic DQN agent.

Related papers

PRInTS: Reward Modeling for Long-Horizon Information Seeking [74.14496236655911]
We introduce PRInTS, a generative PRM trained with dual capabilities.<n>We show that PRInTS enhances information-seeking abilities of open-source models as well as specialized agents.
arXiv Detail & Related papers (2025-11-24T17:09:43Z)
Expandable Decision-Making States for Multi-Agent Deep Reinforcement Learning in Soccer Tactical Analysis [6.8055385768376615]
Invasion team sports such as soccer produce a high-dimensional, strongly coupled state space as many players interact on a shared field.<n>Traditional rule-based analyses are intuitive, while modern predictive machine learning models often perform pattern-matching without explicit agent representations.<n>Here, we propose Expandable Decision-Making States (EDMS), a semantically enriched state representation that augments raw positions and velocities with relational variables.
arXiv Detail & Related papers (2025-10-01T04:01:51Z)
Online Competitive Information Gathering for Partially Observable Trajectory Games [24.25139588281181]
Game-theoretic agents must make plans that optimally gather information about their opponents.<n>We formulate a finite history/horizon refinement of POSGs which admits competitive information gathering behavior in trajectory space.<n>We present an online method for computing rational trajectory plans in these games which leverages particle-based estimations of the state space and performs gradient play.
arXiv Detail & Related papers (2025-06-02T17:45:58Z)
Automatic Reward Shaping from Confounded Offline Data [69.11672390876763]
Building on the well-celebrated Deep Q-Network (DQN), we propose a novel deep reinforcement learning algorithm robust to confounding biases in observed data.<n>We apply our method to twelve confounded Atari games, and find that it consistently dominates the standard DQN in all games where the observed input to the behavioral and target policies mismatch and unobserved confounders exist.
arXiv Detail & Related papers (2025-05-16T17:40:01Z)
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning [99.645427839457]
Self-Play Critic (SPC) is a novel approach where a critic model evolves its ability to assess reasoning steps through adversarial self-play games. SPC involves fine-tuning two copies of a base model to play two roles, namely a "sneaky generator" and a "critic"
arXiv Detail & Related papers (2025-04-27T08:45:06Z)
Explainable and Interpretable Forecasts on Non-Smooth Multivariate Time Series for Responsible Gameplay [20.363472927691255]
Actionable Forecasting Network (AFN) addresses the inter-dependent challenges associated with three exclusive objectives. AFN achieves 25% improvement on the MSE of the forecasts on player data in comparison to the SOM-VAE based SOTA networks.
arXiv Detail & Related papers (2025-04-03T11:49:24Z)
Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer [62.01554688056335]
Overestimation in the multiagent setting has received comparatively little attention. We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
arXiv Detail & Related papers (2025-02-04T05:14:58Z)
Bidding Games on Markov Decision Processes with Quantitative Reachability Objectives [3.4486432774139355]
We study a new family of graph games which combine environmental uncertainties and auction-based interactions among the agents. We devise value-it algorithms that approximate thresholds and optimal policies for general MDPs. We show that finding thresholds is at least as hard as solving simple-stochastic games.
arXiv Detail & Related papers (2024-12-27T12:10:00Z)
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax [73.03684002513218]
We enhance Deep InfoMax (DIM) to enable automatic matching of learned representations to a selected prior distribution. We show that such modification allows for learning uniformly and normally distributed representations. The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
arXiv Detail & Related papers (2024-10-09T15:40:04Z)
Competing for pixels: a self-play algorithm for weakly-supervised segmentation [7.416217935677032]
We propose a novel WSS method that gamifies image segmentation of a region. Agents compete to select ROI-containing patches until exhaustion of all such patches. This competitive setup ensures minimisation of over- or under-segmentation.
arXiv Detail & Related papers (2024-05-26T17:00:17Z)
HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms [31.51588071503617]
We consider a variant of continuous-state partially-observable games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, while the other agent is assumed to have full knowledge of the state. We present an efficient online method to compute an $varepsilon$-minimax strategy profile for each agent.
arXiv Detail & Related papers (2024-04-16T15:58:20Z)
An Index Policy Based on Sarsa and Q-learning for Heterogeneous Smart Target Tracking [13.814608044569967]
We propose a new policy, namely ISQ, to maximize the long-term tracking rewards. Numerical results demonstrate that the proposed ISQ policy outperforms conventional Q-learning-based methods.
arXiv Detail & Related papers (2024-02-19T10:13:25Z)
Auto-Encoding Bayesian Inverse Games [36.06617326128679]
We consider the inverse game problem, in which some properties of the game are unknown a priori. Existing maximum likelihood estimation approaches to solve inverse games provide only point estimates of unknown parameters. We take a Bayesian perspective and construct posterior distributions of game parameters. This structured VAE can be trained from an unlabeled dataset of observed interactions.
arXiv Detail & Related papers (2024-02-14T02:17:37Z)
DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning [66.85379279041128]
In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking to automatically select exemplars for in-context learning. DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5% to 94.2%.
arXiv Detail & Related papers (2023-10-04T16:44:37Z)
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games [79.23847247132345]
This work investigates how well an artificial agent can benefit from playing guessing games when later asked to perform on novel NLP downstream tasks such as Visual Question Answering (VQA) We propose two ways to exploit playing guessing games: 1) a supervised learning scenario in which the agent learns to mimic successful guessing games and 2) a novel way for an agent to play by itself, called Self-play via Iterated Experience Learning (SPIEL)
arXiv Detail & Related papers (2021-01-31T10:30:48Z)
Information Freshness-Aware Task Offloading in Air-Ground Integrated Edge Computing Systems [49.80033982995667]
This paper studies the problem of information freshness-aware task offloading in an air-ground integrated multi-access edge computing system. A third-party real-time application service provider provides computing services to the subscribed mobile users (MUs) with the limited communication and computation resources from the InP. We derive a novel deep reinforcement learning (RL) scheme that adopts two separate double deep Q-networks for each MU to approximate the Q-factor and the post-decision Q-factor.
arXiv Detail & Related papers (2020-07-15T21:32:43Z)
Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA) First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA) Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision. We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet. The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.