Efficient Anti-exploration via VQVAE and Fuzzy Clustering in Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2602.07889v1
- Date: Sun, 08 Feb 2026 09:42:06 GMT
- Title: Efficient Anti-exploration via VQVAE and Fuzzy Clustering in Offline Reinforcement Learning
- Authors: Long Chen, Yinkui Liu, Shen Li, Bo Tang, Xuemin Hu,
- Abstract summary: Pseudo-count is an effective anti-exploration method in offline reinforcement learning (RL) by counting state-action pairs.<n>Existing anti-exploration methods count continuous state-action pairs by discretizing these data, but often suffer from issues of dimension disaster and information loss.<n>In this paper, a novel anti-exploration method based on Vector Quantized Variational Autoencoder (VQVAE) and fuzzy clustering is proposed.
- Score: 14.04169447103753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pseudo-count is an effective anti-exploration method in offline reinforcement learning (RL) by counting state-action pairs and imposing a large penalty on rare or unseen state-action pair data. Existing anti-exploration methods count continuous state-action pairs by discretizing these data, but often suffer from the issues of dimension disaster and information loss in the discretization process, leading to efficiency and performance reduction, and even failure of policy learning. In this paper, a novel anti-exploration method based on Vector Quantized Variational Autoencoder (VQVAE) and fuzzy clustering in offline RL is proposed. We first propose an efficient pseudo-count method based on the multi-codebook VQVAE to discretize state-action pairs, and design an offline RL anti-exploitation method based on the proposed pseudo-count method to handle the dimension disaster issue and improve the learning efficiency. In addition, a codebook update mechanism based on fuzzy C-means (FCM) clustering is developed to improve the use rate of vectors in codebooks, addressing the information loss issue in the discretization process. The proposed method is evaluated on the benchmark of Datasets for Deep Data-Driven Reinforcement Learning (D4RL), and experimental results show that the proposed method performs better and requires less computing cost in multiple complex tasks compared to state-of-the-art (SOTA) methods.
Related papers
- Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
We propose textbfTACO, a test-time-scaling framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks.<n>Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits.
arXiv Detail & Related papers (2025-12-02T14:42:54Z) - Rec-AD: An Efficient Computation Framework for FDIA Detection Based on Tensor Train Decomposition and Deep Learning Recommendation Model [9.222461989780735]
Deep learning models have been widely adopted for False Data Injection Attack (FDIA) detection in smart grids.<n>This paper proposes Rec-AD, a computationally efficient framework that integrates Train decomposition with the Deep Learning Recommendation Model (DLRM)<n>Fully compatible with PyTorch, Rec-AD can be integrated into existing FDIA detection systems without code modifications.
arXiv Detail & Related papers (2025-07-19T15:38:56Z) - Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning [0.8906214436849201]
The goal is to overcome the limitations of traditional centralized approaches in terms of data privacy, node heterogeneity, and anomaly pattern recognition.<n>The proposed method combines the distributed collaborative modeling capabilities of federated learning with the feature discrimination enhancement of contrastive learning.<n>It builds embedding representations on local nodes and constructs positive and negative sample pairs to guide the model in learning a more discriminative feature space.
arXiv Detail & Related papers (2025-06-24T02:04:44Z) - Estimating Control Barriers from Offline Data [14.241303913878887]
We propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to training.<n>With limited amount of offline data, it achieves state-of-the-art performance for dynamic obstacle avoidance.
arXiv Detail & Related papers (2025-02-21T04:55:20Z) - Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Unsupervised feature selection via self-paced learning and low-redundant
regularization [6.083524716031565]
An unsupervised feature selection is proposed by integrating the framework of self-paced learning and subspace learning.
The convergence of the method is proved theoretically and experimentally.
The experimental results show that the proposed method can improve the performance of clustering methods and outperform other compared algorithms.
arXiv Detail & Related papers (2021-12-14T08:28:19Z) - FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal
Retrieval [41.125141897096874]
Cross-modal hashing is favored for its effectiveness and efficiency.
Most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes.
We propose Fast Discriminative Discrete Hashing (FDDH) approach for large-scale cross-modal retrieval.
arXiv Detail & Related papers (2021-05-15T03:53:48Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.