Related papers: Augmented Reinforcement Learning Framework For Enhancing Decision-Making In Machine Learning Models Using External Agents

Augmented Reinforcement Learning Framework For Enhancing Decision-Making In Machine Learning Models Using External Agents

URL: http://arxiv.org/abs/2508.01612v1
Date: Sun, 03 Aug 2025 06:17:44 GMT
Title: Augmented Reinforcement Learning Framework For Enhancing Decision-Making In Machine Learning Models Using External Agents
Authors: Sandesh Kumar Singh,
Abstract summary: This work proposes a novel technique Augmented Reinforcement Learning framework for the improvement of decision-making capabilities.<n>The external agent can be anyone, like humans or automated scripts, that helps in decision path correction.<n>The framework incorporates two external agents that aid in course correction and the guarantee of quality data at all points of the training cycle.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work proposes a novel technique Augmented Reinforcement Learning framework for the improvement of decision-making capabilities of machine learning models. The introduction of agents as external overseers checks on model decisions. The external agent can be anyone, like humans or automated scripts, that helps in decision path correction. It seeks to ascertain the priority of the "Garbage-In, Garbage-Out" problem that caused poor data inputs or incorrect actions in reinforcement learning. The ARL framework incorporates two external agents that aid in course correction and the guarantee of quality data at all points of the training cycle. The External Agent 1 is a real-time evaluator, which will provide feedback light of decisions taken by the model, identify suboptimal actions forming the Rejected Data Pipeline. The External Agent 2 helps in selective curation of the provided feedback with relevance and accuracy in business scenarios creates an approved dataset for future training cycles. The validation of the framework is also applied to a real-world scenario, which is "Document Identification and Information Extraction". This problem originates mainly from banking systems, but can be extended anywhere. The method of classification and extraction of information has to be done correctly here. Experimental results show that including human feedback significantly enhances the ability of the model in order to increase robustness and accuracy in making decisions. The augmented approach, with a combination of machine efficiency and human insight, attains a higher learning standard-mainly in complex or ambiguous environments. The findings of this study show that human-in-the-loop reinforcement learning frameworks such as ARL can provide a scalable approach to improving model performance in data-driven applications.

Related papers

KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG [63.82127103851471]
Retrieval-Augmented Generation (RAG) enables large language models to access broader knowledge sources.<n>We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance.<n>We present KARE-RAG, which improves knowledge utilization through three key innovations.
arXiv Detail & Related papers (2025-06-03T06:31:17Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training [18.896813839389893]
We propose an iterative self-training framework, Agent-R, that enables language Agent to Reflect on the fly.<n>Unlike traditional methods that reward or penalize actions based on correctness, Agent-R leverages MCTS to construct training data that recover correct trajectories from erroneous ones.<n>Our findings demonstrate that Agent-R continuously improves the model's ability to recover from errors and enables timely error correction.
arXiv Detail & Related papers (2025-01-20T11:46:04Z)
Towards Cost Sensitive Decision Making [14.279123976398926]
In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach.
arXiv Detail & Related papers (2024-10-04T19:48:23Z)
External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling [3.536024441537599]
Unlike reinforcement learning (RL) agents, humans remain capable multitaskers in changing environments. We propose an agent influence framework for RL agents to improve the adaptation efficiency of external models in changing environments. Our results show that our method outperforms the baselines in terms of external model adaptation on metrics that measure both efficiency and performance.
arXiv Detail & Related papers (2024-06-28T23:31:22Z)
Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset. We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z)
Striving for data-model efficiency: Identifying data externalities on group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance. We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population. Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z)
Denoised MDPs: Learning World Models Better Than the World Itself [94.74665254213588]
This work categorizes information out in the wild into four types based on controllability and relation with reward, and formulates useful information as that which is both controllable and reward-relevant. Experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone.
arXiv Detail & Related papers (2022-06-30T17:59:49Z)
Differential Assessment of Black-Box AI Agents [29.98710357871698]
We propose a novel approach to differentially assess black-box AI agents that have drifted from their previously known models. We leverage sparse observations of the drifted agent's current behavior and knowledge of its initial model to generate an active querying policy. Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch.
arXiv Detail & Related papers (2022-03-24T17:48:58Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.