Related papers: Differentiable Patch Selection for Image Recognition

Differentiable Patch Selection for Image Recognition

URL: http://arxiv.org/abs/2104.03059v1
Date: Wed, 7 Apr 2021 11:15:51 GMT
Title: Differentiable Patch Selection for Image Recognition
Authors: Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner
Abstract summary: We propose a differentiable Top-K operator to select the most relevant parts of the input to process high resolution images. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations.
Score: 37.11810982945019
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.

Related papers

Scaling Up Single Image Dehazing Algorithm by Cross-Data Vision Alignment for Richer Representation Learning and Beyond [47.425906124301775]
We propose a novel method of cross-data vision alignment for richer representation learning to improve the existing dehazing methodology. By using cross-data external alignment, the datasets inherit samples from different domains that are firmly aligned. Our approach significantly outperforms other advanced methods in dehazing and produces dehazed images that are closest to real haze-free images.
arXiv Detail & Related papers (2024-07-20T10:00:20Z)
Interactive Image Selection and Training for Brain Tumor Segmentation Network [42.62139206176152]
We employ an interactive method for image selection and training based on Feature Learning from Image Markers (FLIM) The results demonstrated that with our methodology, we could choose a small set of images to train the encoder of a U-shaped network, obtaining performance equal to manual selection and even surpassing the same U-shaped network trained with backpropagation and all training images.
arXiv Detail & Related papers (2024-06-05T13:03:06Z)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition. Our method uses the attention mechanism to correlate multiple images within a batch. Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z)
Scalable Federated Learning for Clients with Different Input Image Sizes and Numbers of Output Categories [34.22635158366194]
Federated learning is a privacy-preserving training method which consists of training from a plurality of clients but without sharing their confidential data. We propose an effective federated learning method named ScalableFL, where the depths and widths of the local models for each client are adjusted according to the clients' input image size and the numbers of output categories.
arXiv Detail & Related papers (2023-11-15T05:43:14Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
Learning to Focus: Cascaded Feature Matching Network for Few-shot Image Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images. A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category. Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem. Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z)
Resolution Switchable Networks for Runtime Efficient Image Recognition [46.09537029831355]
We propose a general method to train a single convolutional neural network which is capable of switching image resolutions at inference. Networks trained with the proposed method are named Resolution Switchable Networks (RS-Nets)
arXiv Detail & Related papers (2020-07-19T02:12:59Z)
Learning to Learn Parameterized Classification Networks for Scalable Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change. We employ meta learners to generate convolutional weights of main networks for various input scales. We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z)
ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.