Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning
- URL: http://arxiv.org/abs/2601.01904v1
- Date: Mon, 05 Jan 2026 08:49:30 GMT
- Title: Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning
- Authors: Yuxuan Li, Harshith Reddy Kethireddy, Srijita Das,
- Abstract summary: Learning from Preferences in Reinforcement Learning (PbRL) has gained attention recently, as it serves as a natural fit for complicated tasks where the reward function is not easily available.<n>Much prior literature aimed to detect noise, but with limited types of noise and most being uniformly distributed with no connection to observations.<n>We formalize the notion of targeted feature-dependent noise and propose several variants like trajectory feature noise, trajectory similarity noise, uncertainty-aware noise, and Language Model noise.
- Score: 10.882669528784263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from Preferences in Reinforcement Learning (PbRL) has gained attention recently, as it serves as a natural fit for complicated tasks where the reward function is not easily available. However, preferences often come with uncertainty and noise if they are not from perfect teachers. Much prior literature aimed to detect noise, but with limited types of noise and most being uniformly distributed with no connection to observations. In this work, we formalize the notion of targeted feature-dependent noise and propose several variants like trajectory feature noise, trajectory similarity noise, uncertainty-aware noise, and Language Model noise. We evaluate feature-dependent noise, where noise is correlated with certain features in complex continuous control tasks from DMControl and Meta-world. Our experiments show that in some feature-dependent noise settings, the state-of-the-art noise-robust PbRL method's learning performance is significantly deteriorated, while PbRL method with no explicit denoising can surprisingly outperform noise-robust PbRL in majority settings. We also find language model's noise exhibits similar characteristics to feature-dependent noise, thereby simulating realistic humans and call for further study in learning with feature-dependent noise robustly.
Related papers
- Mixture of Noise for Pre-Trained Model-Based Class-Incremental Learning [59.635264288605946]
Class Incremental Learning (CIL) aims to continuously learn new categories while retaining the knowledge of old ones.<n>Existing approaches that apply lightweight fine-tuning to backbones still induce drift.<n>We propose Mixture of Noise (Min) to mitigate the degradation of backbone generalization from adapting new tasks.
arXiv Detail & Related papers (2025-09-20T16:07:20Z) - Enhance Vision-Language Alignment with Noise [59.2608298578913]
We investigate whether the frozen model can be fine-tuned by customized noise.<n>We propose Positive-incentive Noise (PiNI) which can fine-tune CLIP via injecting noise into both visual and text encoders.
arXiv Detail & Related papers (2024-12-14T12:58:15Z) - NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition [3.726602636064681]
We present an analysis that shows that real noise is significantly more challenging than simulated noise.
We show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound.
arXiv Detail & Related papers (2024-05-13T10:20:31Z) - Understanding the Effect of Noise in LLM Training Data with Algorithmic
Chains of Thought [0.0]
We study how noise in chain of thought impacts task performance in highly-controlled setting.
We define two types of noise: textitstatic noise, a local form of noise which is applied after the CoT trace is computed, and textitdynamic noise, a global form of noise which propagates errors in the trace as it is computed.
We find fine-tuned models are extremely robust to high levels of static noise but struggle significantly more with lower levels of dynamic noise.
arXiv Detail & Related papers (2024-02-06T13:59:56Z) - Deep Variation Prior: Joint Image Denoising and Noise Variance
Estimation without Clean Data [2.3061446605472558]
This paper investigates the tasks of image denoising and noise variance estimation in a single, joint learning framework.
We build upon DVP, an unsupervised deep learning framework, that simultaneously learns a denoiser and estimates noise variances.
Our method does not require any clean training images or an external step of noise estimation, and instead, approximates the minimum mean squared error denoisers using only a set of noisy images.
arXiv Detail & Related papers (2022-09-19T17:29:32Z) - Identifying Hard Noise in Long-Tailed Sample Distribution [71.8462682319137]
We introduce Noisy Long-Tailed Classification (NLT)<n>Most de-noising methods fail to identify the hard noises.<n>We design an iterative noisy learning framework called Hard-to-Easy (H2E)
arXiv Detail & Related papers (2022-07-27T09:03:03Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - C2N: Practical Generative Noise Modeling for Real-World Denoising [53.96391787869974]
We introduce a Clean-to-Noisy image generation framework, namely C2N, to imitate complex real-world noise without using paired examples.
We construct the noise generator in C2N accordingly with each component of real-world noise characteristics to express a wide range of noise accurately.
arXiv Detail & Related papers (2022-02-19T05:53:46Z) - Learning based signal detection for MIMO systems with unknown noise
statistics [84.02122699723536]
This paper aims to devise a generalized maximum likelihood (ML) estimator to robustly detect signals with unknown noise statistics.
In practice, there is little or even no statistical knowledge on the system noise, which in many cases is non-Gaussian, impulsive and not analyzable.
Our framework is driven by an unsupervised learning approach, where only the noise samples are required.
arXiv Detail & Related papers (2021-01-21T04:48:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.