Related papers: Funnel Activation for Visual Recognition

Funnel Activation for Visual Recognition

URL: http://arxiv.org/abs/2007.11824v2
Date: Fri, 24 Jul 2020 11:45:43 GMT
Title: Funnel Activation for Visual Recognition
Authors: Ningning Ma, Xiangyu Zhang, Jian Sun
Abstract summary: We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU) FReLU extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition. We conduct experiments on ImageNet, COCO detection, and semantic segmentation, showing great improvements and robustness of FReLU in the visual recognition tasks.
Score: 92.18474421444377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU), that extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition. The forms of ReLU and PReLU are y = max(x, 0) and y = max(x, px), respectively, while FReLU is in the form of y = max(x,T(x)), where T(x) is the 2D spatial condition. Moreover, the spatial condition achieves a pixel-wise modeling capacity in a simple way, capturing complicated visual layouts with regular convolutions. We conduct experiments on ImageNet, COCO detection, and semantic segmentation tasks, showing great improvements and robustness of FReLU in the visual recognition tasks. Code is available at https://github.com/megvii-model/FunnelAct.

Related papers

Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling [50.34513854725803]
Arbitrary-scale super-resolution (ASSR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs with arbitrary upsampling factors. We propose a novel ContinuousSR framework with a Pixel-to-Gaussian paradigm, which explicitly reconstructs 2D continuous HR signals from LR images using Gaussian Splatting.
arXiv Detail & Related papers (2025-03-09T13:43:57Z)
Efficient Visual State Space Model for Image Deblurring [99.54894198086852]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.<n>We propose a simple yet effective visual state space model (EVSSM) for image deblurring.<n>The proposed EVSSM performs favorably against state-of-the-art methods on benchmark datasets and real-world images.
arXiv Detail & Related papers (2024-05-23T09:13:36Z)
Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)
Dynamic Implicit Image Function for Efficient Arbitrary-Scale Image Representation [24.429100808481394]
We propose Dynamic Implicit Image Function (DIIF), which is a fast and efficient method to represent images with arbitrary resolution. We propose a coordinate grouping and slicing strategy, which enables the neural network to perform decoding from coordinate slices to pixel value slices. With dynamic coordinate slicing, DIIF significantly reduces the computational cost when encountering arbitrary-scale SR.
arXiv Detail & Related papers (2023-06-21T15:04:34Z)
Learning to Zoom and Unzoom [49.587516562644836]
We "learn to zoom" in on the input image, compute spatial features, and then "unzoom" to revert any deformations. We demonstrate this versatility by evaluating on a variety of tasks and datasets.
arXiv Detail & Related papers (2023-03-27T17:03:30Z)
Adaptive Local Implicit Image Function for Arbitrary-scale Super-resolution [61.95533972380704]
Local implicit image function (LIIF) denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. LIIF can be adopted for arbitrary-scale image super-resolution tasks, resulting in a single effective and efficient model for various up-scaling factors. We propose a novel adaptive local image function (A-LIIF) to alleviate this problem.
arXiv Detail & Related papers (2022-08-07T11:23:23Z)
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU. Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module. To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z)
Local Texture Estimator for Implicit Representation Function [10.165529175855712]
Local Texture Estimator (LTE) is a dominant-frequency estimator for natural images. LTE is capable of characterizing image textures in 2D Fourier space. We show that an LTE-based neural function outperforms existing deep SR methods within an arbitrary-scale.
arXiv Detail & Related papers (2021-11-17T06:01:17Z)
UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution [74.82282301089994]
In this work, we propose UltraSR, a simple yet effective new network design based on implicit image functions. We show that spatial encoding is indeed a missing key towards the next-stage high-accuracy implicit image function. Our UltraSR sets new state-of-the-art performance on the DIV2K benchmark under all super-resolution scales.
arXiv Detail & Related papers (2021-03-23T17:36:42Z)
Learning Continuous Image Representation with Local Implicit Image Function [21.27344998709831]
We propose LIIF representation, which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with super-resolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to x30 higher resolution.
arXiv Detail & Related papers (2020-12-16T18:56:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.