Related papers: Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels

Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels

URL: http://arxiv.org/abs/2312.05379v2
Date: Sun, 14 Jan 2024 10:23:09 GMT
Title: Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels
Authors: Bei Zhou, Soren Riis
Abstract summary: We propose a simulated learning process structured within a curriculum learning framework and augmented with noisy labels. This approach thoroughly analyses how neural networks (NNs) adapt and evolve from elementary to increasingly complex game positions.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper delves into applying reinforcement learning (RL) in strategy games, particularly those characterized by parity challenges, as seen in specific positions of Go and Chess and a broader range of impartial games. We propose a simulated learning process, structured within a curriculum learning framework and augmented with noisy labels, to mirror the intricacies of self-play learning scenarios. This approach thoroughly analyses how neural networks (NNs) adapt and evolve from elementary to increasingly complex game positions. Our empirical research indicates that even minimal label noise can significantly impede NNs' ability to discern effective strategies, a difficulty that intensifies with the growing complexity of the game positions. These findings underscore the urgent need for advanced methodologies in RL training, specifically tailored to counter the obstacles imposed by noisy evaluations. The development of such methodologies is crucial not only for enhancing NN proficiency in strategy games with significant parity elements but also for broadening the resilience and efficiency of RL systems across diverse and complex environments.

Related papers

Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning [106.68304931854038]
Reinforcement learning with verifiable rewards (RLVR) has been widely used for enhancing the reasoning abilities of large language models (LLMs)<n>We conduct a systematic empirical analysis of the entropy-performance exchange mechanism of RLVR across different levels of granularity.<n>Our analysis reveals that, in the rising stage, entropy reduction in negative samples facilitates the learning of effective reasoning patterns.<n>In the plateau stage, learning efficiency strongly correlates with high-entropy tokens present in low-perplexity samples and those located at the end of sequences.
arXiv Detail & Related papers (2025-08-04T10:08:10Z)
LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning [29.047063129464494]
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning unfamiliar settings.<n>This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specifics rather than fostering general-purpose thinking strategies.<n>We propose a "play to learn" framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles.
arXiv Detail & Related papers (2025-06-05T09:40:47Z)
Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs? [69.38149239733994]
We investigate whether complex robust training strategies remain necessary as model capacity grows. We find that as models become more powerful, the performance gains brought by complex robust training methods drop off dramatically. Our findings suggest that RAG systems can benefit from simpler architectures and training strategies as models become more powerful.
arXiv Detail & Related papers (2025-02-17T03:34:31Z)
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure. Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z)
Learning Strategy Representation for Imitation Learning in Multi-Agent Games [15.209555810145549]
We introduce the Strategy Representation for Learning (STRIL) framework, which effectively learns strategy representations in multi-agent games. STRIL is a plug-in method that can be integrated into existing IL algorithms. We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four.
arXiv Detail & Related papers (2024-09-28T14:30:17Z)
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory. Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z)
Scalable Learning of Intrusion Responses through Recursive Decomposition [0.0]
We study automated intrusion response for an IT infrastructure and the interaction between an attacker and a defender as a partially observed game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. We introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns equilibria through approximation.
arXiv Detail & Related papers (2023-09-06T18:12:07Z)
Structure in Deep Reinforcement Learning: A Survey and Open Problems [22.77618616444693]
Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural Networks (DNNs) for function approximation, has demonstrated considerable success in numerous applications. However, its practicality in addressing various real-world scenarios, characterized by diverse and unpredictable dynamics, remains limited. This limitation stems from poor data efficiency, limited generalization capabilities, a lack of safety guarantees, and the absence of interpretability.
arXiv Detail & Related papers (2023-06-28T08:48:40Z)
Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints. We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z)
A Game-Theoretic Perspective of Generalization in Reinforcement Learning [9.402272029807316]
Generalization in reinforcement learning (RL) is of importance for real deployment of RL algorithms. We propose a game-theoretic framework for the generalization in reinforcement learning, named GiRL.
arXiv Detail & Related papers (2022-08-07T06:17:15Z)
Impartial Games: A Challenge for Reinforcement Learning [0.0]
We showcase that AlphaZero-style reinforcement learning algorithms encounter significant and fundamental challenges when applied to impartial games.<n>Our findings reveal that while AlphaZero-style agents can achieve champion-level play, their learning progression severely degrades as board size increases.<n>These results align with broader concerns regarding AlphaZero-style algorithms' vulnerability to adversarial attacks.
arXiv Detail & Related papers (2022-05-25T14:02:02Z)
Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world. Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)
Where Did You Learn That From? Surprising Effectiveness of Membership Inference Attacks Against Temporally Correlated Data in Deep Reinforcement Learning [114.9857000195174]
A major challenge to widespread industrial adoption of deep reinforcement learning is the potential vulnerability to privacy breaches. We propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks.
arXiv Detail & Related papers (2021-09-08T23:44:57Z)
Towards Balanced Learning for Instance Recognition [149.76724446376977]
We propose Libra R-CNN, a framework towards balanced learning for instance recognition. It integrates IoU-balanced sampling, balanced feature pyramid, and objective re-weighting, respectively for reducing the imbalance at sample, feature, and objective level.
arXiv Detail & Related papers (2021-08-23T13:40:45Z)
Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms. We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.