A Compositional Paradigm for Foundation Models: Towards Smarter Robotic Agents
- URL: http://arxiv.org/abs/2510.18608v1
- Date: Tue, 21 Oct 2025 13:06:52 GMT
- Title: A Compositional Paradigm for Foundation Models: Towards Smarter Robotic Agents
- Authors: Luigi Quarantiello, Elia Piccoli, Jack Bell, Malio Li, Giacomo Carfì, Eric Nuertey Coleman, Gerlando Gramaglia, Lanpei Li, Mauro Madeddu, Irene Testa, Vincenzo Lomonaco,
- Abstract summary: Foundation Models are able to process huge quantities of data, and can extract and develop rich representations.<n>However, they still have issues in adapting to dynamic, real-world scenarios without retraining the entire model from scratch.<n>We propose the application of Continual Learning and Compositionality principles to foster the development of more flexible, efficient and smart AI solutions.
- Score: 3.5315128335063286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The birth of Foundation Models brought unprecedented results in a wide range of tasks, from language to vision, to robotic control. These models are able to process huge quantities of data, and can extract and develop rich representations, which can be employed across different domains and modalities. However, they still have issues in adapting to dynamic, real-world scenarios without retraining the entire model from scratch. In this work, we propose the application of Continual Learning and Compositionality principles to foster the development of more flexible, efficient and smart AI solutions.
Related papers
- Continual Learning for Generative AI: From LLMs to MLLMs and Beyond [56.29231194002407]
We present a comprehensive survey of continual learning methods for mainstream generative AI models.<n>We categorize these approaches into three paradigms: architecture-based, regularization-based, and replay-based.<n>We analyze continual learning setups for different generative models, including training objectives, benchmarks, and core backbones.
arXiv Detail & Related papers (2025-06-16T02:27:25Z) - TrajLLM: A Modular LLM-Enhanced Agent-Based Framework for Realistic Human Trajectory Simulation [3.8106509573548286]
This work leverages Large Language Models (LLMs) to simulate human mobility, addressing challenges like high costs and privacy concerns in traditional models.<n>Our hierarchical framework integrates persona generation, activity selection, and destination prediction, using real-world demographic and psychological data.
arXiv Detail & Related papers (2025-02-26T00:13:26Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation [82.61572106180705]
This paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories.
We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data.
Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates.
arXiv Detail & Related papers (2024-09-26T17:26:16Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - On Realization of Intelligent Decision-Making in the Real World: A
Foundation Decision Model Perspective [54.38373782121503]
A Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks.
We present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks.
arXiv Detail & Related papers (2022-12-24T06:16:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.