Related papers: Wavelet Predictive Representations for Non-Stationary Reinforcement Learning

Wavelet Predictive Representations for Non-Stationary Reinforcement Learning

URL: http://arxiv.org/abs/2510.04507v1
Date: Mon, 06 Oct 2025 05:49:18 GMT
Title: Wavelet Predictive Representations for Non-Stationary Reinforcement Learning
Authors: Min Wang, Xin Li, Ye He, Yao-Hui Li, Hasnaa Bennis, Riashat Islam, Mingzhong Wang,
Abstract summary: WISDOM captures multi-scale features in evolving MDP sequences by transforming task representation sequences into the wavelet domain.<n>WISDOM significantly outperforms existing baselines in both sample efficiency and performance.
Score: 16.99397898280072
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The real world is inherently non-stationary, with ever-changing factors, such as weather conditions and traffic flows, making it challenging for agents to adapt to varying environmental dynamics. Non-Stationary Reinforcement Learning (NSRL) addresses this challenge by training agents to adapt rapidly to sequences of distinct Markov Decision Processes (MDPs). However, existing NSRL approaches often focus on tasks with regularly evolving patterns, leading to limited adaptability in highly dynamic settings. Inspired by the success of Wavelet analysis in time series modeling, specifically its ability to capture signal trends at multiple scales, we propose WISDOM to leverage wavelet-domain predictive task representations to enhance NSRL. WISDOM captures these multi-scale features in evolving MDP sequences by transforming task representation sequences into the wavelet domain, where wavelet coefficients represent both global trends and fine-grained variations of non-stationary changes. In addition to the auto-regressive modeling commonly employed in time series forecasting, we devise a wavelet temporal difference (TD) update operator to enhance tracking and prediction of MDP evolution. We theoretically prove the convergence of this operator and demonstrate policy improvement with wavelet task representations. Experiments on diverse benchmarks show that WISDOM significantly outperforms existing baselines in both sample efficiency and asymptotic performance, demonstrating its remarkable adaptability in complex environments characterized by non-stationary and stochastically evolving tasks.

Related papers

Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making [48.998030470623384]
offline decision-making requires reliable behaviors from fixed datasets without further interaction.<n>We propose a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives.
arXiv Detail & Related papers (2025-12-09T06:26:02Z)
Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer [0.7161783472741748]
Real-time video motion transfer applications such as immersive gaming require accurate yet diverse future predictions to support realistic synthesis and robust downstream decision making under uncertainty.<n>To improve the diversity of such sequential forecasts we propose a novel inference-time refinement technique that combines Gated Recurrent Unit-Normalizing Flows (GRU-NF) with sampling methods.<n>We show that our inference framework, Gated Recurrent Unit- Normalizing Flows (GRU-SNF) outperforms GRU-NF in generating diverse outputs without sacrificing accuracy, even under longer prediction horizons.
arXiv Detail & Related papers (2025-12-03T21:47:30Z)
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
We propose textbfTACO, a test-time-scaling framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks.<n>Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits.
arXiv Detail & Related papers (2025-12-02T14:42:54Z)
Leveraging Duration Pseudo-Embeddings in Multilevel LSTM and GCN Hypermodels for Outcome-Oriented PPM [4.120576565537633]
Existing deep learning models for Predictive Process Monitoring (PPM) struggle with temporal irregularities.<n>We propose a dual input neural network strategy that separates event and sequence attributes, using a duration-aware pseudo-embedding matrix.<n>Our results demonstrate the benefits of explicit temporal encoding and provide a flexible design for robust, real-world PPM applications.
arXiv Detail & Related papers (2025-11-24T07:06:08Z)
Simulating Distribution Dynamics: Liquid Temporal Feature Evolution for Single-Domain Generalized Object Detection [58.25418970608328]
Single-Domain Generalized Object Detection (Single-DGOD) aims to transfer a detector trained on one source domain to multiple unknown domains.<n>Existing methods for Single-DGOD typically rely on discrete data augmentation or static perturbation methods to expand data diversity.<n>We propose a new method, which simulates the progressive evolution of features from the source domain to simulated latent distributions.
arXiv Detail & Related papers (2025-11-13T03:10:39Z)
Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z)
On-the-Fly Data Augmentation via Gradient-Guided and Sample-Aware Influence Estimation [21.267525672022046]
We introduce SADA, a Sample-Aware Dynamic Augmentation that performs on-the-fly adjustment of augmentation strengths based on each sample's evolving influence on model optimization.<n>Our method is lightweight, which does not require auxiliary models or policy tuning. It can be seamlessly integrated into existing training pipelines as a plug-and-play module.
arXiv Detail & Related papers (2025-10-01T02:26:52Z)
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics [42.446740732573296]
Behavioral Foundation Models (BFMs) proved successful in producing policies for arbitrary tasks in a zero-shot manner.<n>Here, we show that one of the methods from the BFM family, Forward-Backward (FB) representation, cannot distinguish between distinct dynamics.<n>We propose a FB model with a transformer-based belief estimator, which greatly facilitates zero-shot adaptation.
arXiv Detail & Related papers (2025-05-19T14:12:19Z)
Learning Robust Spectral Dynamics for Temporal Domain Generalization [35.98513351187109]
Temporal Domain Generalization seeks to enable model generalization across evolving domains.<n>We introduce FreKoo, which tackles these challenges via a novel frequency-domain analysis of parameter trajectories.<n>FreKoo excels in real-world streaming scenarios with complex drifts and uncertainties.
arXiv Detail & Related papers (2025-05-19T00:38:18Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics [51.147876395589925]
A non-stationary PGDS is proposed to allow the underlying transition matrices to evolve over time. A fully-conjugate and efficient Gibbs sampler is developed to perform posterior simulation. Experiments show that, in comparison with related models, the proposed non-stationary PGDS achieves improved predictive performance.
arXiv Detail & Related papers (2024-02-26T04:39:01Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Out-of-Distribution Generalized Dynamic Graph Neural Network with Disentangled Intervention and Invariance Promotion [61.751257172868186]
Dynamic graph neural networks (DyGNNs) have demonstrated powerful predictive abilities by exploiting graph and temporal dynamics. Existing DyGNNs fail to handle distribution shifts, which naturally exist in dynamic graphs.
arXiv Detail & Related papers (2023-11-24T02:42:42Z)
Stochastically forced ensemble dynamic mode decomposition for forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system. We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony. Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.