Understanding Behavioral Metric Learning: A Large-Scale Study on Distracting Reinforcement Learning Environments
- URL: http://arxiv.org/abs/2506.00563v1
- Date: Sat, 31 May 2025 13:43:41 GMT
- Title: Understanding Behavioral Metric Learning: A Large-Scale Study on Distracting Reinforcement Learning Environments
- Authors: Ziyan Luo, Tianwei Ni, Pierre-Luc Bacon, Doina Precup, Xujie Si,
- Abstract summary: Key approach to state abstraction is approximating behavioral metrics in the observation space and embedding these learned in the representation space.<n>We evaluate five recent approaches, unified conceptually as isometric embeddings with varying design choices.<n>We benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 370 task configurations with diverse noise settings.
- Score: 45.49492366356368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key approach to state abstraction is approximating behavioral metrics (notably, bisimulation metrics) in the observation space and embedding these learned distances in the representation space. While promising for robustness to task-irrelevant noise, as shown in prior work, accurately estimating these metrics remains challenging, requiring various design choices that create gaps between theory and practice. Prior evaluations focus mainly on final returns, leaving the quality of learned metrics and the source of performance gains unclear. To systematically assess how metric learning works in deep reinforcement learning (RL), we evaluate five recent approaches, unified conceptually as isometric embeddings with varying design choices. We benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 370 task configurations with diverse noise settings. Beyond final returns, we introduce the evaluation of a denoising factor to quantify the encoder's ability to filter distractions. To further isolate the effect of metric learning, we propose and evaluate an isolated metric estimation setting, in which the encoder is influenced solely by the metric loss. Finally, we release an open-source, modular codebase to improve reproducibility and support future research on metric learning in deep RL.
Related papers
- OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics [101.78963920333342]
We introduce OpenUnlearning, a standardized framework for benchmarking large language models (LLMs) unlearning methods and metrics.<n>OpenUnlearning integrates 9 unlearning algorithms and 16 diverse evaluations across 3 leading benchmarks.<n>We also benchmark diverse unlearning methods and provide a comparative analysis against an extensive evaluation suite.
arXiv Detail & Related papers (2025-06-14T20:16:37Z) - Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces [34.00971641141313]
"Unlearning" certain concepts in large language models (LLMs) has attracted immense attention recently.
Current protocols to evaluate unlearning methods rely on behavioral tests, without monitoring the presence of associated knowledge.
We argue that unlearning should also be evaluated internally, by considering changes in the parametric knowledge traces of the unlearned concepts.
arXiv Detail & Related papers (2024-06-17T15:00:35Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Metric-oriented Speech Enhancement using Diffusion Probabilistic Model [23.84172431047342]
Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data.
The task-specific evaluation metric (e.g., PESQ) is usually non-differentiable and can not be directly constructed in the training criteria.
We propose a metric-oriented speech enhancement method (MOSE) which integrates a metric-oriented training strategy into its reverse process.
arXiv Detail & Related papers (2023-02-23T13:12:35Z) - Ontology-aware Learning and Evaluation for Audio Tagging [56.59107110017436]
Mean average precision (mAP) metric treats different kinds of sound as independent classes without considering their relations.
Ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.
We conduct human evaluations and demonstrate that OmAP is more consistent with human perception than mAP.
arXiv Detail & Related papers (2022-11-22T11:35:14Z) - Measuring Overfitting in Convolutional Neural Networks using Adversarial
Perturbations and Label Noise [3.395452700023097]
Overfitted neural networks tend to rather memorize noise in the training data than generalize to unseen data.
We introduce several anti-overfitting measures in architectures based on VGG and ResNet.
We assess the applicability of the proposed metrics by measuring the overfitting degree of several CNN architectures outside of our model pool.
arXiv Detail & Related papers (2022-09-27T13:40:53Z) - Self-Supervised Metric Learning in Multi-View Data: A Downstream Task
Perspective [2.01243755755303]
We study how self-supervised metric learning can benefit downstream tasks in the context of multi-view data.
We show that the target distance of metric learning satisfies several desired properties for the downstream tasks.
Our analysis characterizes the improvement by self-supervised metric learning on four commonly used downstream tasks.
arXiv Detail & Related papers (2021-06-14T02:34:33Z) - ReMP: Rectified Metric Propagation for Few-Shot Learning [67.96021109377809]
A rectified metric space is learned to maintain the metric consistency from training to testing.
Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains.
The proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
arXiv Detail & Related papers (2020-12-02T00:07:53Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.