Multiscale Deep Equilibrium Models
- URL: http://arxiv.org/abs/2006.08656v2
- Date: Tue, 24 Nov 2020 06:59:38 GMT
- Title: Multiscale Deep Equilibrium Models
- Authors: Shaojie Bai and Vladlen Koltun and J. Zico Kolter
- Abstract summary: We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ)
An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously.
We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
- Score: 162.15362280927476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new class of implicit networks, the multiscale deep equilibrium
model (MDEQ), suited to large-scale and highly hierarchical pattern recognition
domains. An MDEQ directly solves for and backpropagates through the equilibrium
points of multiple feature resolutions simultaneously, using implicit
differentiation to avoid storing intermediate states (and thus requiring only
$O(1)$ memory consumption). These simultaneously-learned multi-resolution
features allow us to train a single model on a diverse set of tasks and loss
functions, such as using a single MDEQ to perform both image classification and
semantic segmentation. We illustrate the effectiveness of this approach on two
large-scale vision tasks: ImageNet classification and semantic segmentation on
high-resolution images from the Cityscapes dataset. In both settings, MDEQs are
able to match or exceed the performance of recent competitive computer vision
models: the first time such performance and scale have been achieved by an
implicit deep learning approach. The code and pre-trained models are at
https://github.com/locuslab/mdeq .
Related papers
- Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - [MASK] is All You Need [28.90875822599164]
We propose using discrete-state models to connect Masked Generative and Non-autoregressive Diffusion models.
By leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models.
arXiv Detail & Related papers (2024-12-09T18:59:56Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Deep Grouping Model for Unified Perceptual Parsing [36.73032339428497]
The perceptual-based grouping process produces a hierarchical and compositional image representation.
We propose a deep grouping model (DGM) that tightly marries the two types of representations and defines a bottom-up and a top-down process for feature exchanging.
The model achieves state-of-the-art results while having a small computational overhead compared to other contextual-based segmentation models.
arXiv Detail & Related papers (2020-03-25T21:16:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.