Funnel Activation for Visual Recognition
- URL: http://arxiv.org/abs/2007.11824v2
- Date: Fri, 24 Jul 2020 11:45:43 GMT
- Title: Funnel Activation for Visual Recognition
- Authors: Ningning Ma, Xiangyu Zhang, Jian Sun
- Abstract summary: We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU)
FReLU extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition.
We conduct experiments on ImageNet, COCO detection, and semantic segmentation, showing great improvements and robustness of FReLU in the visual recognition tasks.
- Score: 92.18474421444377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a conceptually simple but effective funnel activation for image
recognition tasks, called Funnel activation (FReLU), that extends ReLU and
PReLU to a 2D activation by adding a negligible overhead of spatial condition.
The forms of ReLU and PReLU are y = max(x, 0) and y = max(x, px), respectively,
while FReLU is in the form of y = max(x,T(x)), where T(x) is the 2D spatial
condition. Moreover, the spatial condition achieves a pixel-wise modeling
capacity in a simple way, capturing complicated visual layouts with regular
convolutions. We conduct experiments on ImageNet, COCO detection, and semantic
segmentation tasks, showing great improvements and robustness of FReLU in the
visual recognition tasks. Code is available at
https://github.com/megvii-model/FunnelAct.
Related papers
- Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Dynamic Implicit Image Function for Efficient Arbitrary-Scale Image
Representation [24.429100808481394]
We propose Dynamic Implicit Image Function (DIIF), which is a fast and efficient method to represent images with arbitrary resolution.
We propose a coordinate grouping and slicing strategy, which enables the neural network to perform decoding from coordinate slices to pixel value slices.
With dynamic coordinate slicing, DIIF significantly reduces the computational cost when encountering arbitrary-scale SR.
arXiv Detail & Related papers (2023-06-21T15:04:34Z) - Learning to Zoom and Unzoom [49.587516562644836]
We "learn to zoom" in on the input image, compute spatial features, and then "unzoom" to revert any deformations.
We demonstrate this versatility by evaluating on a variety of tasks and datasets.
arXiv Detail & Related papers (2023-03-27T17:03:30Z) - Adaptive Local Implicit Image Function for Arbitrary-scale
Super-resolution [61.95533972380704]
Local implicit image function (LIIF) denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs.
LIIF can be adopted for arbitrary-scale image super-resolution tasks, resulting in a single effective and efficient model for various up-scaling factors.
We propose a novel adaptive local image function (A-LIIF) to alleviate this problem.
arXiv Detail & Related papers (2022-08-07T11:23:23Z) - SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for
Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU.
Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module.
To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z) - Local Texture Estimator for Implicit Representation Function [10.165529175855712]
Local Texture Estimator (LTE) is a dominant-frequency estimator for natural images.
LTE is capable of characterizing image textures in 2D Fourier space.
We show that an LTE-based neural function outperforms existing deep SR methods within an arbitrary-scale.
arXiv Detail & Related papers (2021-11-17T06:01:17Z) - UltraSR: Spatial Encoding is a Missing Key for Implicit Image
Function-based Arbitrary-Scale Super-Resolution [74.82282301089994]
In this work, we propose UltraSR, a simple yet effective new network design based on implicit image functions.
We show that spatial encoding is indeed a missing key towards the next-stage high-accuracy implicit image function.
Our UltraSR sets new state-of-the-art performance on the DIV2K benchmark under all super-resolution scales.
arXiv Detail & Related papers (2021-03-23T17:36:42Z) - Learning Continuous Image Representation with Local Implicit Image
Function [21.27344998709831]
We propose LIIF representation, which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output.
To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with super-resolution.
The learned continuous representation can be presented in arbitrary resolution even extrapolate to x30 higher resolution.
arXiv Detail & Related papers (2020-12-16T18:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.