Adaptable Hindsight Experience Replay for Search-Based Learning
- URL: http://arxiv.org/abs/2511.03405v1
- Date: Wed, 05 Nov 2025 12:13:23 GMT
- Title: Adaptable Hindsight Experience Replay for Search-Based Learning
- Authors: Alexandros Vazaios, Jannis Brugger, Cedric Derstroff, Kristian Kersting, Mira Mezini,
- Abstract summary: We introduce Adaptable HER (ours), a flexible framework that integrates HER with AlphaZero.<n>Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals.<n>Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and surpasses the performance of pure supervised or reinforcement learning.
- Score: 67.04721081824316
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: AlphaZero-like Monte Carlo Tree Search systems, originally introduced for two-player games, dynamically balance exploration and exploitation using neural network guidance. This combination makes them also suitable for classical search problems. However, the original method of training the network with simulation results is limited in sparse reward settings, especially in the early stages, where the network cannot yet give guidance. Hindsight Experience Replay (HER) addresses this issue by relabeling unsuccessful trajectories from the search tree as supervised learning signals. We introduce Adaptable HER (\ours{}), a flexible framework that integrates HER with AlphaZero, allowing easy adjustments to HER properties such as relabeled goals, policy targets, and trajectory selection. Our experiments, including equation discovery, show that the possibility of modifying HER is beneficial and surpasses the performance of pure supervised or reinforcement learning.
Related papers
- Learning to Condition: A Neural Heuristic for Scalable MPE Inference [7.287294240824019]
We introduce learning to condition (L2C), a scalable, data-driven framework for accelerating Most Probable Explanation (MPE) inference.<n>L2C trains a neural network to score variable-value assignments based on their utility for conditioning, given observed evidence.<n>We develop a scalable data generation pipeline that extracts training signals from the search traces of existing MPE solvers.
arXiv Detail & Related papers (2025-09-22T18:24:31Z) - LTRR: Learning To Rank Retrievers for LLMs [53.285436927963865]
We show that routing-based RAG systems can outperform the best single-retriever-based systems.<n>Performance gains are especially pronounced in models trained with the Answer Correctness (AC) metric.<n>As part of the SIGIR 2025 LiveRAG challenge, our submitted system demonstrated the practical viability of our approach.
arXiv Detail & Related papers (2025-06-16T17:53:18Z) - Learning to Reason without External Rewards [100.27210579418562]
Training large language models (LLMs) for complex reasoning via Reinforcement Learning with Verifiable Rewards (RLVR) is effective but limited by reliance on costly, domain-specific supervision.<n>We explore Reinforcement Learning from Internal Feedback (RLIF), a framework that enables LLMs to learn from intrinsic signals without external rewards or labeled data.<n>We propose Intuitor, an RLIF method that uses a model's own confidence, termed self-certainty, as its sole reward signal.
arXiv Detail & Related papers (2025-05-26T07:01:06Z) - Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning [56.29097276129473]
We propose a simple yet effective framework, named Learning Prompt with Distribution-based Feature Replay (LP-DiF)<n>To prevent the learnable prompt from forgetting old knowledge in the new session, we propose a pseudo-feature replay approach.<n>When progressing to a new session, pseudo-features are sampled from old-class distributions combined with training images of the current session to optimize the prompt.
arXiv Detail & Related papers (2024-01-03T07:59:17Z) - Learning Search-Space Specific Heuristics Using Neural Networks [13.226916009242347]
Our system learns distance-to-goal estimators from scratch, given a single PDDL training instance.
We show that this relatively simple system can perform surprisingly well, sometimes competitive with well-known domain-independent classicals.
arXiv Detail & Related papers (2023-06-06T21:22:32Z) - OER: Offline Experience Replay for Continual Offline Reinforcement Learning [25.985985377992034]
Continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent.
In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks.
We propose a new model-based experience selection scheme to build the replay buffer, where a transition model is learned to approximate the state distribution.
arXiv Detail & Related papers (2023-05-23T08:16:44Z) - Hebbian Continual Representation Learning [9.54473759331265]
Continual Learning aims to bring machine learning into a more realistic scenario.
We investigate whether biologically inspired Hebbian learning is useful for tackling continual challenges.
arXiv Detail & Related papers (2022-06-28T09:21:03Z) - Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL [91.26538493552817]
We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward.
We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
arXiv Detail & Related papers (2021-12-02T00:51:17Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - PyTorch-Hebbian: facilitating local learning in a deep learning
framework [67.67299394613426]
Hebbian local learning has shown potential as an alternative training mechanism to backpropagation.
We propose a framework for thorough and systematic evaluation of local learning rules in existing deep learning pipelines.
The framework is used to expand the Krotov-Hopfield learning rule to standard convolutional neural networks without sacrificing accuracy.
arXiv Detail & Related papers (2021-01-31T10:53:08Z) - Learning Intrinsic Symbolic Rewards in Reinforcement Learning [7.101885582663675]
We present a method that discovers dense rewards in the form of low-dimensional symbolic trees.
We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks.
arXiv Detail & Related papers (2020-10-08T00:02:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.