Related papers: Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent

Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent

URL: http://arxiv.org/abs/2511.23436v1
Date: Fri, 28 Nov 2025 18:32:49 GMT
Title: Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent
Authors: Jianzhe Lin, Zeyu Pan, Yun Zhu, Ruiqi Song, Jining Yang,
Abstract summary: SuperIntelliAgent is an agentic learning framework that couples a trainable small diffusion model (the learner) with a frozen large language model (the verifier)<n>Unlike conventional supervised fine-tuning, SuperIntelliAgent learns autonomously without annotation.<n>We posit that pairing a trainable learner with a reasoning-capable verifier forms a minimal reliable unit of growing intelligence.
Score: 10.571643330948858
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce SuperIntelliAgent, an agentic learning framework that couples a trainable small diffusion model (the learner) with a frozen large language model (the verifier) to enable continual intelligence growth through self-supervised interaction. Unlike conventional supervised fine-tuning, SuperIntelliAgent learns autonomously without annotation: the learner generates candidate outputs, the verifier evaluates them through step-by-step reasoning, and their interaction produces chosen/rejected pairs for Direct Preference Optimization (DPO). This converts each input into a pseudo-training signal for continual improvement. The framework integrates dual-scale memory: short-term in-context memory that preserves reasoning traces across refinement cycles, and long-term memory that consolidates acquired knowledge through lightweight on-the-fly fine-tuning. A replay buffer retains samples that show verifiable progress and replays them as auxiliary supervision, reinforcing recent learning while forming adaptive curricula. SuperIntelliAgent is infrastructure-agnostic and can be plugged into existing agentic frameworks while turning ordinary inference loops into a lifelong optimization process. We posit that pairing a trainable learner with a reasoning-capable verifier forms a minimal reliable unit of growing intelligence, as paired feedback and partial-history replay yield richer learning curricula and stronger preference alignment. With a small number of automatically generated DPO pairs, the learner improves across all benchmarks, indicating that this mechanism provides a promising direction for continual intelligence accumulation and real-world deployment.

Related papers

Internalizing Multi-Agent Reasoning for Accurate and Efficient LLM-based Recommendation [22.9032468841993]
Large Language Models (LLMs) are reshaping recommender systems by leveraging extensive world knowledge and semantic reasoning to interpret user intent.<n>We propose a trajectory-driven internalization framework to develop a Single-agent Trajectory-Aligned Recommender (STAR)
arXiv Detail & Related papers (2026-02-10T14:36:59Z)
Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models [23.128973540926552]
Endogenous Reprompting transforms the model's understanding into an explicit generative reasoning step.<n>We show that SEER consistently outperforms state-of-the-art baselines in evaluation accuracy, reprompting efficiency, and generation quality.
arXiv Detail & Related papers (2026-01-28T06:54:36Z)
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory [46.632646462295234]
We propose MemRL, a framework that enables agents to self-evolve via non-parametric reinforcement learning on episodic memory.<n>MemRL employs a Two-Phase Retrieval mechanism that filters candidates by semantic relevance and then selects them based on learned Q-values.<n>Our analysis experiments confirm that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates.
arXiv Detail & Related papers (2026-01-06T17:14:50Z)
Towards Efficient Agents: A Co-Design of Inference Architecture and System [66.59916327634639]
This paper presents AgentInfer, a unified framework for end-to-end agent acceleration.<n>We decompose the problem into four synergistic components: AgentCollab, AgentSched, AgentSAM, and AgentCompress.<n>Experiments on the BrowseComp-zh and DeepDiver benchmarks demonstrate that through the synergistic collaboration of these methods, AgentInfer reduces ineffective token consumption by over 50%.
arXiv Detail & Related papers (2025-12-20T12:06:13Z)
WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection [51.10348385624784]
We present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism.<n>Our approach substantially extends tool-use chains and improves answer accuracy.
arXiv Detail & Related papers (2025-10-21T16:52:00Z)
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory [57.517214479414726]
ReasoningBank is a memory framework that distills generalizable reasoning strategies from an agent's self-judged successful and failed experiences.<n>At test time, an agent retrieves relevant memories from ReasoningBank to inform its interaction and then integrates new learnings back, enabling it to become more capable over time.<n>We introduce memory-aware test-time scaling (MaTTS), which accelerates and diversifies this learning process by scaling up the agent's interaction experience.
arXiv Detail & Related papers (2025-09-29T17:51:03Z)
Memory Management and Contextual Consistency for Long-Running Low-Code Agents [0.0]
This paper proposes a novel hybrid memory system designed specifically for LCNC agents.<n>Inspired by cognitive science, our architecture combines episodic and semantic memory components with a proactive "Intelligent Decay" mechanism.<n>Key innovation is a user-centric visualization interface, aligned with the LCNC paradigm, which allows non-technical users to manage the agent's memory directly.
arXiv Detail & Related papers (2025-09-27T08:01:26Z)
STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z)
SWE-Bench-CL: Continual Learning for Coding Agents [0.0]
SWE-Bench-CL is a novel continual learning benchmark built on the human-verified SWE-Bench Verified dataset.<n>By organizing GitHub issues into chronologically ordered sequences that reflect natural repository evolution, SWE-Bench-CL enables direct evaluation of an agent's ability to accumulate experience.
arXiv Detail & Related papers (2025-06-13T07:11:14Z)
ReVeal: Self-Evolving Code Agents via Reliable Self-Verification [11.875519107421312]
We introduce ReVeal, a reinforcement learning framework that evolves code generation through self-verification and tool-based evaluation.<n>At inference, this strengthened self-verification enables the model to use self-constructed tests and tool feedback to continuously evolve code for 20+ turns on LiveCodeBench despite training on only three.<n>These findings highlight the promise of ReVeal as a scalable paradigm for RL training and test-time scaling, paving the way for more robust and autonomous AI agents.
arXiv Detail & Related papers (2025-06-13T03:41:04Z)
Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems [27.161576657380646]
Agent4Edu is a novel personalized learning simulator leveraging recent advancements in human intelligence through large language models (LLMs)<n>The learner profiles are using real-world response data, capturing practice styles and cognitive factors.<n>Each agent can interact with personalized learning algorithms, such as computerized adaptive testing.
arXiv Detail & Related papers (2025-01-17T18:05:04Z)
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters. To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z)
Empowering Private Tutoring by Chaining Large Language Models [87.76985829144834]
This work explores the development of a full-fledged intelligent tutoring system powered by state-of-the-art large language models (LLMs) The system is into three inter-connected core processes-interaction, reflection, and reaction. Each process is implemented by chaining LLM-powered tools along with dynamically updated memory modules.
arXiv Detail & Related papers (2023-09-15T02:42:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.