Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent
- URL: http://arxiv.org/abs/2511.23436v1
- Date: Fri, 28 Nov 2025 18:32:49 GMT
- Title: Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent
- Authors: Jianzhe Lin, Zeyu Pan, Yun Zhu, Ruiqi Song, Jining Yang,
- Abstract summary: SuperIntelliAgent is an agentic learning framework that couples a trainable small diffusion model (the learner) with a frozen large language model (the verifier)<n>Unlike conventional supervised fine-tuning, SuperIntelliAgent learns autonomously without annotation.<n>We posit that pairing a trainable learner with a reasoning-capable verifier forms a minimal reliable unit of growing intelligence.
- Score: 10.571643330948858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce SuperIntelliAgent, an agentic learning framework that couples a trainable small diffusion model (the learner) with a frozen large language model (the verifier) to enable continual intelligence growth through self-supervised interaction. Unlike conventional supervised fine-tuning, SuperIntelliAgent learns autonomously without annotation: the learner generates candidate outputs, the verifier evaluates them through step-by-step reasoning, and their interaction produces chosen/rejected pairs for Direct Preference Optimization (DPO). This converts each input into a pseudo-training signal for continual improvement. The framework integrates dual-scale memory: short-term in-context memory that preserves reasoning traces across refinement cycles, and long-term memory that consolidates acquired knowledge through lightweight on-the-fly fine-tuning. A replay buffer retains samples that show verifiable progress and replays them as auxiliary supervision, reinforcing recent learning while forming adaptive curricula. SuperIntelliAgent is infrastructure-agnostic and can be plugged into existing agentic frameworks while turning ordinary inference loops into a lifelong optimization process. We posit that pairing a trainable learner with a reasoning-capable verifier forms a minimal reliable unit of growing intelligence, as paired feedback and partial-history replay yield richer learning curricula and stronger preference alignment. With a small number of automatically generated DPO pairs, the learner improves across all benchmarks, indicating that this mechanism provides a promising direction for continual intelligence accumulation and real-world deployment.
Related papers
- Internalizing Multi-Agent Reasoning for Accurate and Efficient LLM-based Recommendation [22.9032468841993]
Large Language Models (LLMs) are reshaping recommender systems by leveraging extensive world knowledge and semantic reasoning to interpret user intent.<n>We propose a trajectory-driven internalization framework to develop a Single-agent Trajectory-Aligned Recommender (STAR)
arXiv Detail & Related papers (2026-02-10T14:36:59Z) - Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models [23.128973540926552]
Endogenous Reprompting transforms the model's understanding into an explicit generative reasoning step.<n>We show that SEER consistently outperforms state-of-the-art baselines in evaluation accuracy, reprompting efficiency, and generation quality.
arXiv Detail & Related papers (2026-01-28T06:54:36Z) - MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory [46.632646462295234]
We propose MemRL, a framework that enables agents to self-evolve via non-parametric reinforcement learning on episodic memory.<n>MemRL employs a Two-Phase Retrieval mechanism that filters candidates by semantic relevance and then selects them based on learned Q-values.<n>Our analysis experiments confirm that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates.
arXiv Detail & Related papers (2026-01-06T17:14:50Z) - Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z) - WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection [51.10348385624784]
We present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism.<n>Our approach substantially extends tool-use chains and improves answer accuracy.
arXiv Detail & Related papers (2025-10-21T16:52:00Z) - ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory [57.517214479414726]
ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences.<n>At test time, an agent retrieves relevant memories from ReasoningBank to inform its interaction and then integrates new learnings back, enabling it to become more capable over time.<n>We introduce memory-aware test-time scaling (MaTTS), which accelerates and diversifies this learning process by scaling up the agent's interaction experience.
arXiv Detail & Related papers (2025-09-29T17:51:03Z) - Memory Management and Contextual Consistency for Long-Running Low-Code Agents [0.0]
This paper proposes a novel hybrid memory system designed specifically for LCNC agents.<n>Inspired by cognitive science, our architecture combines episodic and semantic memory components with a proactive "Intelligent Decay" mechanism.<n>Key innovation is a user-centric visualization interface, aligned with the LCNC paradigm, which allows non-technical users to manage the agent's memory directly.
arXiv Detail & Related papers (2025-09-27T08:01:26Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - SWE-Bench-CL: Continual Learning for Coding Agents [0.0]
SWE-Bench-CL is a novel continual learning benchmark built on the human-verified SWE-Bench Verified dataset.<n>By organizing GitHub issues into chronologically ordered sequences that reflect natural repository evolution, SWE-Bench-CL enables direct evaluation of an agent's ability to accumulate experience.
arXiv Detail & Related papers (2025-06-13T07:11:14Z) - ReVeal: Self-Evolving Code Agents via Reliable Self-Verification [11.875519107421312]
We introduce ReVeal, a reinforcement learning framework that evolves code generation through self-verification and tool-based evaluation.<n>At inference, this strengthened self-verification enables the model to use self-constructed tests and tool feedback to continuously evolve code for 20+ turns on LiveCodeBench despite training on only three.<n>These findings highlight the promise of ReVeal as a scalable paradigm for RL training and test-time scaling, paving the way for more robust and autonomous AI agents.
arXiv Detail & Related papers (2025-06-13T03:41:04Z) - Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems [27.161576657380646]
Agent4Edu is a novel personalized learning simulator leveraging recent advancements in human intelligence through large language models (LLMs)<n>The learner profiles are using real-world response data, capturing practice styles and cognitive factors.<n>Each agent can interact with personalized learning algorithms, such as computerized adaptive testing.
arXiv Detail & Related papers (2025-01-17T18:05:04Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Empowering Private Tutoring by Chaining Large Language Models [87.76985829144834]
This work explores the development of a full-fledged intelligent tutoring system powered by state-of-the-art large language models (LLMs)
The system is into three inter-connected core processes-interaction, reflection, and reaction.
Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules.
arXiv Detail & Related papers (2023-09-15T02:42:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.