Continual Learning, Not Training: Online Adaptation For Agents
- URL: http://arxiv.org/abs/2511.01093v1
- Date: Sun, 02 Nov 2025 21:48:31 GMT
- Title: Continual Learning, Not Training: Online Adaptation For Agents
- Authors: Aman Jaglan, Jarrod Barnes,
- Abstract summary: We introduce our Adaptive Teaching and Learning System (ATLAS), a dual-agent architecture that decouples reasoning (Teacher) from execution (Student)<n>ATLAS achieves gradient-free continual learning, shifting the locus of adaptation from model parameters to system-level orchestration.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual Learning (CL) methods have traditionally focused on mitigating catastrophic forgetting through gradient-based retraining, an approach ill-suited for deployed agents that must adapt in real time. We introduce our Adaptive Teaching and Learning System (ATLAS), a dual-agent architecture that decouples reasoning (Teacher) from execution (Student) and incorporates a persistent learning memory that stores distilled guidance from experience. This informs the orchestration layer, enabling the system to dynamically adjust its operational strategies, such as supervision level or initial plan selection, at inference time. In doing so, ATLAS achieves gradient-free continual learning, shifting the locus of adaptation from model parameters to system-level orchestration. We formulate this as a system-centric paradigm for continual learning, where the objective is adaptive efficiency: maximizing task success while minimizing computational cost through inference-time orchestration rather than parameter updates. Evaluated on Microsoft's ExCyTIn-Bench, an open-source benchmark simulating complex cyberthreat investigation, ATLAS achieves 54.1% success with GPT-5-mini as its Student, outperforming the larger GPT-5 (High) by 13% while reducing cost by 86%. Cross-incident validation demonstrates generalization: frozen pamphlets from Incident #5 improve accuracy from 28% to 41% with zero retraining, while shifting output composition from verbose exploration to structured reasoning. Together, these findings establish gradient-free continual learning as a viable path toward adaptive, deployable AI systems and provide causally annotated traces valuable for training explicit world models.
Related papers
- ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation [54.071574153853994]
ProRAG is a process-supervised reinforcement learning framework designed to integrate learned step-level supervision into the online optimization loop.<n>Our framework consists of four stages: (1) Supervised Policy Warmup to initialize the model with a structured reasoning format; (2) construction of an MCTS-based Process Reward Model (PRM) to quantify intermediate reasoning quality; (3) PRM-Guided Reasoning Refinement to align the policy with fine-grained process preferences; and (4) Process-Supervised Reinforcement Learning with a dual-granularity advantage mechanism.
arXiv Detail & Related papers (2026-01-29T16:04:59Z) - Sequencing to Mitigate Catastrophic Forgetting in Continual Learning [1.1724961392643483]
Catastrophic forgetting (CF) is a major challenge to the progress of Continual Learning approaches.<n>We consider the role of task sequencing in mitigating CF and propose a method for determining the optimal task order.<n>Results demonstrate that intelligent task sequencing can substantially reduce CF.
arXiv Detail & Related papers (2025-12-18T18:40:58Z) - SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation [12.838593066237452]
Large language models (LLMs) memorize frequently sensitive information during training, posing risks when deploying publicly accessible models.<n>This paper presents our solution to SemEval-2025 Task 4 on targeted unlearning, which combines causal mediation analysis with layer-specific optimization.
arXiv Detail & Related papers (2025-04-17T15:05:40Z) - SAEs $\ extit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs [24.48560556882878]
We introduce $textbfDynamic DAE Guardrails$ (DSG), a novel method for precision unlearning.<n>Our experiments show DSG substantially outperforms leading unlearning methods.
arXiv Detail & Related papers (2025-04-11T01:24:03Z) - Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models.<n>Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process.<n>We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z) - Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning [19.27175827358111]
Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones.<n>We propose a novel continual full fine-tuning approach leveraging adaptive singular value decomposition (SVD)<n>We evaluate our approach extensively on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B) models.
arXiv Detail & Related papers (2025-04-09T17:59:42Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - Kaizen: Practical Self-supervised Continual Learning with Continual
Fine-tuning [21.36130180647864]
Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient.
We introduce a training architecture that is able to mitigate catastrophic forgetting.
Kaizen significantly outperforms previous SSL models in competitive vision benchmarks.
arXiv Detail & Related papers (2023-03-30T09:08:57Z) - Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation [95.31590177308482]
We propose an automated multi-loss adaptation (named Ada-Segment) to flexibly adjust multiple training losses over the course of training.
With an end-to-end architecture, Ada-Segment generalizes to different datasets without the need of re-tuning hyper parameters.
Ada-Segment brings 2.7% panoptic quality (PQ) improvement on COCO val split from the vanilla baseline, achieving the state-of-the-art 48.5% PQ on COCO test-dev split and 32.9% PQ on ADE20K dataset.
arXiv Detail & Related papers (2020-12-07T11:43:10Z) - Semi-supervised ASR by End-to-end Self-training [18.725686837244265]
We propose a self-training method with an end-to-end system for semi-supervised ASR.
We iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update.
Our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 50%.
arXiv Detail & Related papers (2020-01-24T18:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.