Multiscale Deep Equilibrium Models
- URL: http://arxiv.org/abs/2006.08656v2
- Date: Tue, 24 Nov 2020 06:59:38 GMT
- Title: Multiscale Deep Equilibrium Models
- Authors: Shaojie Bai and Vladlen Koltun and J. Zico Kolter
- Abstract summary: We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ)
An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously.
We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
- Score: 162.15362280927476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new class of implicit networks, the multiscale deep equilibrium
model (MDEQ), suited to large-scale and highly hierarchical pattern recognition
domains. An MDEQ directly solves for and backpropagates through the equilibrium
points of multiple feature resolutions simultaneously, using implicit
differentiation to avoid storing intermediate states (and thus requiring only
$O(1)$ memory consumption). These simultaneously-learned multi-resolution
features allow us to train a single model on a diverse set of tasks and loss
functions, such as using a single MDEQ to perform both image classification and
semantic segmentation. We illustrate the effectiveness of this approach on two
large-scale vision tasks: ImageNet classification and semantic segmentation on
high-resolution images from the Cityscapes dataset. In both settings, MDEQs are
able to match or exceed the performance of recent competitive computer vision
models: the first time such performance and scale have been achieved by an
implicit deep learning approach. The code and pre-trained models are at
https://github.com/locuslab/mdeq .
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - Self-Supervised Open-Ended Classification with Small Visual Language
Models [60.23212389067007]
We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models.
By using models with approximately 1B parameters we outperform the few-shot abilities of much larger models, such as Frozen and FROMAGe.
arXiv Detail & Related papers (2023-09-30T21:41:21Z) - Mitigating Modality Collapse in Multimodal VAEs via Impartial
Optimization [7.4262579052708535]
We argue that this effect is a consequence of conflicting gradients during multimodal VAE training.
We show how to detect the sub-graphs in the computational graphs where gradients conflict.
We empirically show that our framework significantly improves the reconstruction performance, conditional generation, and coherence of the latent space across modalities.
arXiv Detail & Related papers (2022-06-09T13:29:25Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Collaboration among Image and Object Level Features for Image
Colourisation [25.60139324272782]
Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum.
Previous approaches attacked the problem either by requiring intense user interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image level (context) features.
We propose a single network, named UCapsNet, that separate image-level features obtained through convolutions and object-level features captured by means of capsules.
Then, by skip connections over different layers, we enforce collaboration between such disentangling factors to produce high quality and plausible image colourisation.
arXiv Detail & Related papers (2021-01-19T11:48:12Z) - Deep Grouping Model for Unified Perceptual Parsing [36.73032339428497]
The perceptual-based grouping process produces a hierarchical and compositional image representation.
We propose a deep grouping model (DGM) that tightly marries the two types of representations and defines a bottom-up and a top-down process for feature exchanging.
The model achieves state-of-the-art results while having a small computational overhead compared to other contextual-based segmentation models.
arXiv Detail & Related papers (2020-03-25T21:16:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.