Composite Reward Design in PPO-Driven Adaptive Filtering
- URL: http://arxiv.org/abs/2506.06323v1
- Date: Thu, 29 May 2025 23:11:48 GMT
- Title: Composite Reward Design in PPO-Driven Adaptive Filtering
- Authors: Abdullah Burkan Bereketoglu,
- Abstract summary: This letter proposes an adaptive filtering framework using Proximal Policy Optimization (PPO), guided by a composite reward that balances SNR improvement, MSE reduction, and residual smoothness.<n> Experiments on synthetic signals with various noise types show that our PPO agent generalizes beyond its training distribution, achieving real-time performance and outperforming classical filters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Model-free and reinforcement learning-based adaptive filtering methods are gaining traction for denoising in dynamic, non-stationary environments such as wireless signal channels. Traditional filters like LMS, RLS, Wiener, and Kalman are limited by assumptions of stationary or requiring complex fine-tuning or exact noise statistics or fixed models. This letter proposes an adaptive filtering framework using Proximal Policy Optimization (PPO), guided by a composite reward that balances SNR improvement, MSE reduction, and residual smoothness. Experiments on synthetic signals with various noise types show that our PPO agent generalizes beyond its training distribution, achieving real-time performance and outperforming classical filters. This work demonstrates the viability of policy-gradient reinforcement learning for robust, low-latency adaptive signal filtering.
Related papers
- Latent FxLMS: Accelerating Active Noise Control with Neural Adaptive Filters [1.1545092788508224]
Filtered-X LMS (FxLMS) is commonly used for active noise control (ANC)<n>We train an auto-encoder on the filter coefficients of the steady-state adaptive filter for each primary source location sampled from a given spatial region.<n>We evaluate how various neural network constraints and normalization techniques impact the convergence speed and steady-state mean squared error.
arXiv Detail & Related papers (2025-07-05T01:25:42Z) - On Symmetric Losses for Robust Policy Optimization with Noisy Preferences [55.8615920580824]
This work focuses on reward modeling, a core component in reinforcement learning from human feedback.<n>We propose a principled framework for robust policy optimization under noisy preferences.<n>We prove that symmetric losses enable successful policy optimization even under noisy labels.
arXiv Detail & Related papers (2025-05-30T15:30:43Z) - A Unified Bayesian Perspective for Conventional and Robust Adaptive Filters [15.640261000544077]
We present a new perspective on the origin and interpretation of adaptive filters.<n>We can present, in a unified framework, derivations of many adaptive filters which depend on the probabilistic model of the observational noise.<n> Numerical examples are shown to illustrate the properties and provide a better insight into the performance of the derived adaptive filters.
arXiv Detail & Related papers (2025-02-25T16:20:10Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models.
Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z) - Closed-form Filtering for Non-linear Systems [83.91296397912218]
We propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency.
We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models.
Our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities.
arXiv Detail & Related papers (2024-02-15T08:51:49Z) - Poisson Conjugate Prior for PHD Filtering based Track-Before-Detect
Strategies in Radar Systems [9.04251355210029]
We propose a principled closed-form solution of TBD-PHD filter for low signal-to-noise ratio (SNR) scenarios.
Also, sequential Monte Carlo implementations of dynamic and amplitude echo models are proposed for the radar system.
arXiv Detail & Related papers (2023-02-22T13:03:31Z) - Parallel APSM for Fast and Adaptive Digital SIC in Full-Duplex
Transceivers with Nonlinearity [19.534700035048637]
kernel-based adaptive filter is applied for the digital digital domain self-interference cancellation (SIC) in transceiver in full (FD) mode.
They demonstrate that the kernel-based algorithm achieves a favorable level of digital SIC while enabling parallel computation-based implementation within a rich and nonlinear function space.
arXiv Detail & Related papers (2022-07-12T11:17:22Z) - Neural Network-augmented Kalman Filtering for Robust Online Speech
Dereverberation in Noisy Reverberant Environments [13.49645012479288]
A neural network-augmented algorithm for noise-robust online dereverberation is proposed.
The presented framework allows for robust dereverberation on a single-channel noisy reverberant dataset.
arXiv Detail & Related papers (2022-04-06T11:38:04Z) - Filter-enhanced MLP is All You Need for Sequential Recommendation [89.0974365344997]
In online platforms, logged user behavior data is inevitable to contain noise.
We borrow the idea of filtering algorithms from signal processing that attenuates the noise in the frequency domain.
We propose textbfFMLP-Rec, an all-MLP model with learnable filters for sequential recommendation task.
arXiv Detail & Related papers (2022-02-28T05:49:35Z) - Adaptive Low-Pass Filtering using Sliding Window Gaussian Processes [71.23286211775084]
We propose an adaptive low-pass filter based on Gaussian process regression.
We show that the estimation error of the proposed method is uniformly bounded.
arXiv Detail & Related papers (2021-11-05T17:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.