A Multi-Component Reward Function with Policy Gradient for Automated Feature Selection with Dynamic Regularization and Bias Mitigation
- URL: http://arxiv.org/abs/2510.09705v1
- Date: Thu, 09 Oct 2025 22:45:38 GMT
- Title: A Multi-Component Reward Function with Policy Gradient for Automated Feature Selection with Dynamic Regularization and Bias Mitigation
- Authors: Sudip Khadka, L. S. Paudel,
- Abstract summary: Static feature exclusion strategies fail to prevent bias when hidden dependencies influence the model predictions.<n>We develop a reinforcement learning framework that integrates bias mitigation and automated feature selection within a single learning process.<n>We aim to provide a flexible and generalizable way to select features in environments where predictors are correlated and biases can inadvertently re-emerge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Static feature exclusion strategies often fail to prevent bias when hidden dependencies influence the model predictions. To address this issue, we explore a reinforcement learning (RL) framework that integrates bias mitigation and automated feature selection within a single learning process. Unlike traditional heuristic-driven filter or wrapper approaches, our RL agent adaptively selects features using a reward signal that explicitly integrates predictive performance with fairness considerations. This dynamic formulation allows the model to balance generalization, accuracy, and equity throughout the training process, rather than rely exclusively on pre-processing adjustments or post hoc correction mechanisms. In this paper, we describe the construction of a multi-component reward function, the specification of the agents action space over feature subsets, and the integration of this system with ensemble learning. We aim to provide a flexible and generalizable way to select features in environments where predictors are correlated and biases can inadvertently re-emerge.
Related papers
- Ensemble-size-dependence of deep-learning post-processing methods that minimize an (un)fair score: motivating examples and a proof-of-concept solution [0.0]
We introduce trajectory transformers as a proof-of-concept that ensemble-size independence can be achieved.<n>This approach is an adaptation of the Post-processing Ensembles with Transformers (PoET) framework.
arXiv Detail & Related papers (2026-02-17T18:59:55Z) - Online Matching via Reinforcement Learning: An Expert Policy Orchestration Strategy [5.913458789333235]
We propose a reinforcement learning (RL) approach that learns to orchestrate a set of such expert policies.<n>We establish both expectation and high-probability regret guarantees and derive a novel finite-time bias bound for temporal-difference learning.<n>Our results highlight how structured, adaptive learning can improve the modeling and management of complex resource allocation and decision-making processes.
arXiv Detail & Related papers (2025-10-07T23:26:16Z) - Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z) - Recursive Reward Aggregation [60.51668865089082]
We propose an alternative approach for flexible behavior alignment that eliminates the need to modify the reward function.<n>By introducing an algebraic perspective on Markov decision processes (MDPs), we show that the Bellman equations naturally emerge from the generation and aggregation of rewards.<n>Our approach applies to both deterministic and deterministic settings and seamlessly integrates with value-based and actor-critic algorithms.
arXiv Detail & Related papers (2025-07-11T12:37:20Z) - Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions.<n>This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z) - Invariant Federated Learning for Edge Intelligence: Mitigating Heterogeneity and Asynchrony via Exit Strategy and Invariant Penalty [10.54196990763149]
This paper provides an invariant federated learning system for resource-constrained edge intelligence.<n>It can mitigate the impact of heterogeneous and asynchrony via exit strategy and invariant penalty.<n>It shows our system can enhance In-Distribution performance and outperform the state-of-the-art algorithm in Out-Of-Distribution generalization.
arXiv Detail & Related papers (2025-03-08T10:47:27Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - SDPERL: A Framework for Software Defect Prediction Using Ensemble Feature Extraction and Reinforcement Learning [0.0]
This paper proposes an innovative framework for software defect prediction.<n>It combines ensemble feature extraction with reinforcement learning (RL)--based feature selection.<n>We claim that this work is among the first in recent efforts to address this challenge at the file-level granularity.
arXiv Detail & Related papers (2024-12-10T21:16:05Z) - Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline
Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms.
Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z) - Automatic Debiased Machine Learning for Dynamic Treatment Effects and
General Nested Functionals [23.31865419578237]
We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals.
We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a Riesz representer characterization of nested mean regressions.
arXiv Detail & Related papers (2022-03-25T19:54:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.