Composite Reward Design in PPO-Driven Adaptive Filtering
- URL: http://arxiv.org/abs/2506.06323v1
- Date: Thu, 29 May 2025 23:11:48 GMT
- Title: Composite Reward Design in PPO-Driven Adaptive Filtering
- Authors: Abdullah Burkan Bereketoglu,
- Abstract summary: This letter proposes an adaptive filtering framework using Proximal Policy Optimization (PPO), guided by a composite reward that balances SNR improvement, MSE reduction, and residual smoothness.<n> Experiments on synthetic signals with various noise types show that our PPO agent generalizes beyond its training distribution, achieving real-time performance and outperforming classical filters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Model-free and reinforcement learning-based adaptive filtering methods are gaining traction for denoising in dynamic, non-stationary environments such as wireless signal channels. Traditional filters like LMS, RLS, Wiener, and Kalman are limited by assumptions of stationary or requiring complex fine-tuning or exact noise statistics or fixed models. This letter proposes an adaptive filtering framework using Proximal Policy Optimization (PPO), guided by a composite reward that balances SNR improvement, MSE reduction, and residual smoothness. Experiments on synthetic signals with various noise types show that our PPO agent generalizes beyond its training distribution, achieving real-time performance and outperforming classical filters. This work demonstrates the viability of policy-gradient reinforcement learning for robust, low-latency adaptive signal filtering.
Related papers
- Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights [0.0]
This work introduces the Meta-Adaptive UKF (MA-UKF), a framework that reformulates sigma-point weight as a hyper parameter optimization problem.<n>Unlike standard adaptive filters that rely on instantaneous corrections, our approach employs a Recurrent Context to compress the history of measurement innovations into a compact latent embedding.<n> Numerical benchmarks on maneuvering targets demonstrate that the MA-UKF significantly outperforms standard baselines.
arXiv Detail & Related papers (2026-03-04T18:27:59Z) - G$^2$RPO: Granular GRPO for Precise Reward in Flow Models [74.21206048155669]
We propose a novel Granular-GRPO (G$2$RPO) framework that achieves precise and comprehensive reward assessments of sampling directions.<n>We introduce a Multi-Granularity Advantage Integration module that aggregates advantages computed at multiple diffusion scales.<n>Our G$2$RPO significantly outperforms existing flow-based GRPO baselines.
arXiv Detail & Related papers (2025-10-02T12:57:12Z) - Fluid Antenna System-assisted Physical Layer Secret Key Generation [64.92952968689636]
This paper investigates physical-layer generation (PLKG) in multiant base station systems by leveraging a fluid antenna system (FAS) to dynamically radio environments.<n>We propose an assisted PLKG model that integrates transmit beamforming and port selection under independent and spatially correlated environments.<n>It is shown that the sliding window-based port selection method introduced in this paper achieves higher KGR with fewer chains through dynamic port selection.
arXiv Detail & Related papers (2025-09-19T03:01:29Z) - Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement Learning [2.309018557701645]
We propose the Sequential Matched Filter (SMF), a paradigm that replaces the conventional single matched filter with a sequence of filters designed by a Reinforcement Learning agent.<n>By formulating filter design as a sequential decision-making process, SMF adaptively design signal-specific filter sequences that remain fully interpretable.
arXiv Detail & Related papers (2025-08-29T14:15:35Z) - Latent FxLMS: Accelerating Active Noise Control with Neural Adaptive Filters [1.1545092788508224]
Filtered-X LMS (FxLMS) is commonly used for active noise control (ANC)<n>We train an auto-encoder on the filter coefficients of the steady-state adaptive filter for each primary source location sampled from a given spatial region.<n>We evaluate how various neural network constraints and normalization techniques impact the convergence speed and steady-state mean squared error.
arXiv Detail & Related papers (2025-07-05T01:25:42Z) - On Symmetric Losses for Robust Policy Optimization with Noisy Preferences [55.8615920580824]
This work focuses on reward modeling, a core component in reinforcement learning from human feedback.<n>We propose a principled framework for robust policy optimization under noisy preferences.<n>We prove that symmetric losses enable successful policy optimization even under noisy labels.
arXiv Detail & Related papers (2025-05-30T15:30:43Z) - A Unified Bayesian Perspective for Conventional and Robust Adaptive Filters [15.640261000544077]
We present a new perspective on the origin and interpretation of adaptive filters.<n>We can present, in a unified framework, derivations of many adaptive filters which depend on the probabilistic model of the observational noise.<n> Numerical examples are shown to illustrate the properties and provide a better insight into the performance of the derived adaptive filters.
arXiv Detail & Related papers (2025-02-25T16:20:10Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models.
Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z) - Closed-form Filtering for Non-linear Systems [83.91296397912218]
We propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency.
We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models.
Our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities.
arXiv Detail & Related papers (2024-02-15T08:51:49Z) - Poisson Conjugate Prior for PHD Filtering based Track-Before-Detect
Strategies in Radar Systems [9.04251355210029]
We propose a principled closed-form solution of TBD-PHD filter for low signal-to-noise ratio (SNR) scenarios.
Also, sequential Monte Carlo implementations of dynamic and amplitude echo models are proposed for the radar system.
arXiv Detail & Related papers (2023-02-22T13:03:31Z) - Parallel APSM for Fast and Adaptive Digital SIC in Full-Duplex
Transceivers with Nonlinearity [19.534700035048637]
kernel-based adaptive filter is applied for the digital digital domain self-interference cancellation (SIC) in transceiver in full (FD) mode.
They demonstrate that the kernel-based algorithm achieves a favorable level of digital SIC while enabling parallel computation-based implementation within a rich and nonlinear function space.
arXiv Detail & Related papers (2022-07-12T11:17:22Z) - Neural Network-augmented Kalman Filtering for Robust Online Speech
Dereverberation in Noisy Reverberant Environments [13.49645012479288]
A neural network-augmented algorithm for noise-robust online dereverberation is proposed.
The presented framework allows for robust dereverberation on a single-channel noisy reverberant dataset.
arXiv Detail & Related papers (2022-04-06T11:38:04Z) - Filter-enhanced MLP is All You Need for Sequential Recommendation [89.0974365344997]
In online platforms, logged user behavior data is inevitable to contain noise.
We borrow the idea of filtering algorithms from signal processing that attenuates the noise in the frequency domain.
We propose textbfFMLP-Rec, an all-MLP model with learnable filters for sequential recommendation task.
arXiv Detail & Related papers (2022-02-28T05:49:35Z) - Adaptive Low-Pass Filtering using Sliding Window Gaussian Processes [71.23286211775084]
We propose an adaptive low-pass filter based on Gaussian process regression.
We show that the estimation error of the proposed method is uniformly bounded.
arXiv Detail & Related papers (2021-11-05T17:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.