Related papers: Convolutional Networks as Extremely Small Foundation Models: Visual Prompting and Theoretical Perspective

Convolutional Networks as Extremely Small Foundation Models: Visual Prompting and Theoretical Perspective

URL: http://arxiv.org/abs/2409.10555v1
Date: Tue, 3 Sep 2024 12:34:23 GMT
Title: Convolutional Networks as Extremely Small Foundation Models: Visual Prompting and Theoretical Perspective
Authors: Jianqiao Wangni,
Abstract summary: In this paper, we design a prompting module which performs few-shot adaptation of generic deep networks to new tasks. Driven by learning theory, we derive prompting modules that are as simple as possible, as they generalize better under the same training error. In practice, SDForest has extremely low cost and achieves real-time even on CPU.
Score: 1.79487674052027
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Comparing to deep neural networks trained for specific tasks, those foundational deep networks trained on generic datasets such as ImageNet classification, benefits from larger-scale datasets, simpler network structure and easier training techniques. In this paper, we design a prompting module which performs few-shot adaptation of generic deep networks to new tasks. Driven by learning theory, we derive prompting modules that are as simple as possible, as they generalize better under the same training error. We use a case study on video object segmentation to experiment. We give a concrete prompting module, the Semi-parametric Deep Forest (SDForest) that combines several nonparametric methods such as correlation filter, random forest, image-guided filter, with a deep network trained for ImageNet classification task. From a learning-theoretical point of view, all these models are of significantly smaller VC dimension or complexity so tend to generalize better, as long as the empirical studies show that the training error of this simple ensemble can achieve comparable results from a end-to-end trained deep network. We also propose a novel methods of analyzing the generalization under the setting of video object segmentation to make the bound tighter. In practice, SDForest has extremely low computation cost and achieves real-time even on CPU. We test on video object segmentation tasks and achieve competitive performance at DAVIS2016 and DAVIS2017 with purely deep learning approaches, without any training or fine-tuning.

Related papers

ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z)
Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction. Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image. Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z)
Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework. To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules. This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z)
Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
Semantic Segmentation With Multi Scale Spatial Attention For Self Driving Cars [2.7317088388886384]
We present a novel neural network using multi scale feature fusion at various scales for accurate and efficient semantic image segmentation. We used ResNet based feature extractor, dilated convolutional layers in downsampling part, atrous convolutional layers in the upsampling part and used concat operation to merge them. A new attention module is proposed to encode more contextual information and enhance the receptive field of the network.
arXiv Detail & Related papers (2020-06-30T20:19:09Z)
FNA++: Fast Network Adaptation via Parameter Remapping and Architecture Search [35.61441231491448]
We propose a Fast Network Adaptation (FNA++) method, which can adapt both the architecture and parameters of a seed network. In our experiments, we apply FNA++ on MobileNetV2 to obtain new networks for semantic segmentation, object detection, and human pose estimation. The total computation cost of FNA++ is significantly less than SOTA segmentation and detection NAS approaches.
arXiv Detail & Related papers (2020-06-21T10:03:34Z)
Adjoined Networks: A Training Paradigm with Applications to Network Compression [3.995047443480282]
We introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network.
arXiv Detail & Related papers (2020-06-10T02:48:16Z)
Finding the Optimal Network Depth in Classification Tasks [10.248235276871258]
We develop a fast end-to-end method for training lightweight neural networks using multiple classifier heads. By allowing the model to determine the importance of each head, we are able to detect and remove unneeded components of the network.
arXiv Detail & Related papers (2020-04-17T11:08:45Z)
Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.