Total Variation Optimization Layers for Computer Vision
- URL: http://arxiv.org/abs/2204.03643v1
- Date: Thu, 7 Apr 2022 17:59:27 GMT
- Title: Total Variation Optimization Layers for Computer Vision
- Authors: Raymond A. Yeh, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. Schwing
- Abstract summary: We propose total variation (TV) minimization as a layer for computer vision.
Motivated by the success of total variation in image processing, we hypothesize that TV as a layer provides useful inductive bias for deep-nets.
We study this hypothesis on five computer vision tasks: image classification, weakly supervised object localization, edge-preserving smoothing, edge detection, and image denoising.
- Score: 130.10996341231743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimization within a layer of a deep-net has emerged as a new direction for
deep-net layer design. However, there are two main challenges when applying
these layers to computer vision tasks: (a) which optimization problem within a
layer is useful?; (b) how to ensure that computation within a layer remains
efficient? To study question (a), in this work, we propose total variation (TV)
minimization as a layer for computer vision. Motivated by the success of total
variation in image processing, we hypothesize that TV as a layer provides
useful inductive bias for deep-nets too. We study this hypothesis on five
computer vision tasks: image classification, weakly supervised object
localization, edge-preserving smoothing, edge detection, and image denoising,
improving over existing baselines. To achieve these results we had to address
question (b): we developed a GPU-based projected-Newton method which is
$37\times$ faster than existing solutions.
Related papers
- Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP)
Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid.
PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z) - Multi-scale Unified Network for Image Classification [33.560003528712414]
CNNs face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs.
We propose Multi-scale Unified Network (MUSN) consisting of multi-scales, a unified network, and scale-invariant constraint.
MUSN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.
arXiv Detail & Related papers (2024-03-27T06:40:26Z) - Tuning computer vision models with task rewards [88.45787930908102]
Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models.
In natural language processing, this is often addressed using reinforcement learning techniques that align models with a task reward.
We adopt this approach and show its surprising effectiveness across multiple computer vision tasks, such as object detection, panoptic segmentation, colorization and image captioning.
arXiv Detail & Related papers (2023-02-16T11:49:48Z) - Deep Generalized Unfolding Networks for Image Restoration [16.943609020362395]
We propose a Deep Generalized Unfolding Network (DGUNet) for image restoration.
We integrate a gradient estimation strategy into the gradient descent step of the Proximal Gradient Descent (PGD) algorithm.
Our method is superior in terms of state-of-the-art performance, interpretability, and generalizability.
arXiv Detail & Related papers (2022-04-28T08:39:39Z) - DeepRLS: A Recurrent Network Architecture with Least Squares Implicit
Layers for Non-blind Image Deconvolution [15.986942312624]
We study the problem of non-blind image deconvolution.
We propose a novel recurrent network architecture that leads to very competitive restoration results of high image quality.
arXiv Detail & Related papers (2021-12-10T13:16:51Z) - A Deeper Look into DeepCap [96.67706102518238]
We propose a novel deep learning approach for monocular dense human performance capture.
Our method is trained in a weakly supervised manner based on multi-view supervision.
Our approach outperforms the state of the art in terms of quality and robustness.
arXiv Detail & Related papers (2021-11-20T11:34:33Z) - TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency [72.9106103283475]
We study the visual saliency, a.k.a. visual explanation, to interpret convolutional neural networks.
Inspired by those observations, we propose a novel visual saliency framework, termed Target-Selective Gradient (TSG) backprop.
The proposed TSG consists of two components, namely, TSG-Conv and TSG-FC, which rectify the gradients for convolutional layers and fully-connected layers, respectively.
arXiv Detail & Related papers (2021-10-11T12:00:20Z) - Conflicting Bundles: Adapting Architectures Towards the Improved
Training of Deep Neural Networks [1.7188280334580195]
We introduce a novel theory and metric to identify layers that decrease the test accuracy of the trained models.
We identify those layers that worsen the performance because they produce conflicting training bundles.
Based on these findings, a novel algorithm is introduced to remove performance decreasing layers automatically.
arXiv Detail & Related papers (2020-11-05T16:41:04Z) - Feature Space Saturation during Training [0.0]
We show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss.
We derive layer saturation - the ratio between the eigenspace dimension and layer width.
We demonstrate how to alter layer saturation in a neural network by changing network depth, filter sizes and input resolution.
arXiv Detail & Related papers (2020-06-15T18:28:21Z) - Binary Neural Networks: A Survey [126.67799882857656]
The binary neural network serves as a promising technique for deploying deep models on resource-limited devices.
The binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.
We present a survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error.
arXiv Detail & Related papers (2020-03-31T16:47:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.