Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning
- URL: http://arxiv.org/abs/2512.00076v1
- Date: Tue, 25 Nov 2025 07:26:00 GMT
- Title: Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning
- Authors: Minghe Gao, Juncheng Li, Yuze Lin, Xuqi Liu, Jiaming Ji, Xiaoran Pan, Zihan Xu, Xian Li, Mingjie Li, Wei Ji, Rong Wei, Rui Tang, Qizhou Wang, Kai Shen, Jun Xiao, Qi Wu, Siliang Tang, Yueting Zhuang,
- Abstract summary: We introduce Arcadia, a closed-loop framework that operationalizes embodied lifelong learning by tightly coupling four stages.<n> Arcadia delivers consistent gains on navigation and manipulation benchmarks and transfers robustly to physical robots.
- Score: 80.6669596031697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We contend that embodied learning is fundamentally a lifecycle problem rather than a single-stage optimization. Systems that optimize only one link (data collection, simulation, learning, or deployment) rarely sustain improvement or generalize beyond narrow settings. We introduce Arcadia, a closed-loop framework that operationalizes embodied lifelong learning by tightly coupling four stages: (1) Self-evolving exploration and grounding for autonomous data acquisition in physical environments, (2) Generative scene reconstruction and augmentation for realistic and extensible scene creation, (3) a Shared embodied representation architecture that unifies navigation and manipulation within a single multimodal backbone, and (4) Sim-from-real evaluation and evolution that closes the feedback loop through simulation-based adaptation. This coupling is non-decomposable: removing any stage breaks the improvement loop and reverts to one-shot training. Arcadia delivers consistent gains on navigation and manipulation benchmarks and transfers robustly to physical robots, indicating that a tightly coupled lifecycle: continuous real-world data acquisition, generative simulation update, and shared-representation learning, supports lifelong improvement and end-to-end generalization. We release standardized interfaces enabling reproducible evaluation and cross-model comparison in reusable environments, positioning Arcadia as a scalable foundation for general-purpose embodied agents.
Related papers
- ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning [31.000965640377128]
ABot-M0 is a framework that builds a systematic data curation pipeline.<n>It enables end-to-end transformation of heterogeneous raw data into unified, efficient representations.<n>ABot-M0 supports modular perception via a dual-stream mechanism.
arXiv Detail & Related papers (2026-02-11T16:47:01Z) - Experience Scaling: Post-Deployment Evolution For Large Language Models [44.48142891798125]
We propose experience scaling, a framework for continuous post-deployment evolution for large language models (LLMs)<n>We validate the framework in simulated real-world scenarios involving generalization to previously unseen but related tasks, repetitive queries, and over-saturated knowledge stores.<n>Results demonstrate that structured post-deployment learning can extend LLM capabilities beyond the limits of static human-generated data.
arXiv Detail & Related papers (2025-09-23T08:04:58Z) - From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z) - A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.<n>To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.<n>We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Reusable Architecture Growth for Continual Stereo Matching [92.36221737921274]
We introduce a Reusable Architecture Growth (RAG) framework to learn new scenes continually in both supervised and self-supervised manners.
RAG can maintain high reusability during growth by reusing previous units while obtaining good performance.
We also present a Scene Router module to adaptively select the scene-specific architecture path at inference.
arXiv Detail & Related papers (2024-03-30T13:24:58Z) - SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning [61.419914155985886]
We propose SimVPv2, a streamlined model that eliminates the need for Unet architectures for spatial and temporal modeling.<n>SimVPv2 not only simplifies the model architecture but also improves both performance and computational efficiency.<n>On the standard Moving MNIST benchmark, SimVPv2 achieves superior performance compared to SimVP, with fewer FLOPs, about half the training time and 60% faster inference efficiency.
arXiv Detail & Related papers (2022-11-22T08:01:33Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.