Related papers: Neural Metamorphosis

Neural Metamorphosis

URL: http://arxiv.org/abs/2410.11878v1
Date: Thu, 10 Oct 2024 14:49:58 GMT
Title: Neural Metamorphosis
Authors: Xingyi Yang, Xinchao Wang,
Abstract summary: This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks. NeuMeta directly learns the continuous weight manifold of neural networks. It sustains full-size performance even at a 75% compression rate.
Score: 72.88137795439407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks. Contrary to crafting separate models for different architectures or sizes, NeuMeta directly learns the continuous weight manifold of neural networks. Once trained, we can sample weights for any-sized network directly from the manifold, even for previously unseen configurations, without retraining. To achieve this ambitious goal, NeuMeta trains neural implicit functions as hypernetworks. They accept coordinates within the model space as input, and generate corresponding weight values on the manifold. In other words, the implicit function is learned in a way, that the predicted weights is well-performed across various models sizes. In training those models, we notice that, the final performance closely relates on smoothness of the learned manifold. In pursuit of enhancing this smoothness, we employ two strategies. First, we permute weight matrices to achieve intra-model smoothness, by solving the Shortest Hamiltonian Path problem. Besides, we add a noise on the input coordinates when training the implicit function, ensuring models with various sizes shows consistent outputs. As such, NeuMeta shows promising results in synthesizing parameters for various network configurations. Our extensive tests in image classification, semantic segmentation, and image generation reveal that NeuMeta sustains full-size performance even at a 75% compression rate.

Related papers

Generative Modeling of Weights: Generalization or Memorization? [5.365909921563036]
Generative models have been explored for effective neural network weights.<n>In this work, we examine four methods on their ability to generate novel model weights.<n>We find that these methods synthesize weights largely by memorization.
arXiv Detail & Related papers (2025-06-09T17:58:36Z)
How to Train Your Metamorphic Deep Neural Network [16.734594915917647]
Neural Metamorphosis (NeuMeta) is a recent paradigm for generating neural networks of varying width and depth.<n>We propose a training algorithm that extends the capabilities of NeuMeta to enable full-network metamorphosis with minimal accuracy degradation.
arXiv Detail & Related papers (2025-05-07T09:01:06Z)
Generative Feature Training of Thin 2-Layer Networks [0.0]
We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on squared loss and small datasets. As a highly hidden model, we exploit hidden weights with samples from learned distribution proposal. We refine the sampled weights with gradient-based post-processing in the latent space.
arXiv Detail & Related papers (2024-11-11T10:32:33Z)
The Persian Rug: solving toy models of superposition using large-scale symmetries [0.0]
We present a complete mechanistic description of the algorithm learned by a minimal non-linear sparse data autoencoder in the limit of large input dimension. Our work contributes to neural network interpretability by introducing techniques for understanding the structure of autoencoders.
arXiv Detail & Related papers (2024-10-15T22:52:45Z)
MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters [19.358670728803336]
Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes. To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size. This naive approach falls short in practice as it brings too much noise to the growing process.
arXiv Detail & Related papers (2023-11-07T11:37:08Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
PRANC: Pseudo RAndom Networks for Compacting deep models [22.793523211040682]
PRANC enables significant compaction of a deep model. In this study, we employ PRANC to condense image classification models and compress images by compacting their associated implicit neural networks.
arXiv Detail & Related papers (2022-06-16T22:03:35Z)
Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction. Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image. Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z)
FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories. We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go [109.88509362837475]
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes. NeuroMorph produces smooth and point-to-point correspondences between them. It works well for a large variety of input shapes, including non-isometric pairs from different object categories.
arXiv Detail & Related papers (2021-06-17T12:25:44Z)
From Boltzmann Machines to Neural Networks and Back Again [31.613544605376624]
We give new results for learning Restricted Boltzmann Machines, probably the most well-studied class of latent variable models. Our results are based on new connections to learning two-layer neural networks under $ell_infty$ bounded input. We then give an algorithm for learning a natural class of supervised RBMs with better runtime than what is possible for its related class of networks without distributional assumptions.
arXiv Detail & Related papers (2020-07-25T00:42:50Z)
Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling. We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category. We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.