On the creation of narrow AI: hierarchy and nonlocality of neural network skills
- URL: http://arxiv.org/abs/2505.15811v1
- Date: Wed, 21 May 2025 17:59:21 GMT
- Title: On the creation of narrow AI: hierarchy and nonlocality of neural network skills
- Authors: Eric J. Michaud, Asher Parker-Sartori, Max Tegmark,
- Abstract summary: We study the problem of creating strong, yet narrow, AI systems.<n>We find that it is sometimes necessary to train networks on a wide distribution of data to learn certain narrow skills.<n>We find that methods based on pruning can still outperform distillation.
- Score: 8.96017219406018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of creating strong, yet narrow, AI systems. While recent AI progress has been driven by the training of large general-purpose foundation models, the creation of smaller models specialized for narrow domains could be valuable for both efficiency and safety. In this work, we explore two challenges involved in creating such systems, having to do with basic properties of how neural networks learn and structure their representations. The first challenge regards when it is possible to train narrow models from scratch. Through experiments on a synthetic task, we find that it is sometimes necessary to train networks on a wide distribution of data to learn certain narrow skills within that distribution. This effect arises when skills depend on each other hierarchically, and training on a broad distribution introduces a curriculum which substantially accelerates learning. The second challenge regards how to transfer particular skills from large general models into small specialized models. We find that model skills are often not perfectly localized to a particular set of prunable components. However, we find that methods based on pruning can still outperform distillation. We investigate the use of a regularization objective to align desired skills with prunable components while unlearning unnecessary skills.
Related papers
- Pretrained Bayesian Non-parametric Knowledge Prior in Robotic Long-Horizon Reinforcement Learning [10.598207472087578]
Reinforcement learning (RL) methods typically learn new tasks from scratch, often disregarding prior knowledge that could accelerate the learning process.<n>This work introduces a method that models potential primitive skill motions as having non-parametric properties with an unknown number of underlying features.<n>We utilize a non-parametric model, specifically Dirichlet Process Mixtures, enhanced with birth and merge, to pre-train a skill prior that effectively captures the diverse nature of skills.
arXiv Detail & Related papers (2025-03-27T20:43:36Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill
Learning [65.41865750258775]
General purpose agents will require large repertoires of skills.
We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable.
In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.
arXiv Detail & Related papers (2023-07-06T02:27:05Z) - Cooperative data-driven modeling [44.99833362998488]
Data-driven modeling in mechanics is evolving rapidly based on recent machine learning advances.
New data and models created by different groups become available, opening possibilities for cooperative modeling.
Artificial neural networks suffer from catastrophic forgetting, i.e. they forget how to perform an old task when trained on a new one.
This hinders cooperation because adapting an existing model for a new task affects the performance on a previous task trained by someone else.
arXiv Detail & Related papers (2022-11-23T14:27:25Z) - What Do Adversarially trained Neural Networks Focus: A Fourier
Domain-based Study [8.912245110734334]
This work studies what information the adversarially trained model focuses on.
We consider two common ways to improve model robustness, namely, by data augmentation and by using stronger network architectures.
arXiv Detail & Related papers (2022-03-16T16:37:17Z) - One Model, Multiple Tasks: Pathways for Natural Language Understanding [34.58880663537492]
This paper presents a Pathways approach to handle many tasks at once.
Unlike prevailing single-purpose models that overspecialize at individual tasks and learn from scratch when being extended to new tasks, our approach is general-purpose with the ability of stitching together existing skills to learn new tasks more effectively.
arXiv Detail & Related papers (2022-03-07T11:48:09Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration.
Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design.
We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Distributed Training of Deep Learning Models: A Taxonomic Perspective [11.924058430461216]
Distributed deep learning systems (DDLS) train deep neural network models by utilizing the distributed resources of a cluster.
We aim to shine some light on the fundamental principles that are at work when training deep neural networks in a cluster of independent machines.
arXiv Detail & Related papers (2020-07-08T08:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.