DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning
- URL: http://arxiv.org/abs/2003.06777v5
- Date: Thu, 30 Mar 2023 10:48:54 GMT
- Title: DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning
- Authors: Chi Zhang, Yujun Cai, Guosheng Lin, Chunhua Shen
- Abstract summary: We develop methods for few-shot image classification from a new perspective of optimal matching between image regions.
We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations.
To generate the important weights of elements in the formulation, we design a cross-reference mechanism.
- Score: 122.51237307910878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we develop methods for few-shot image classification from a new
perspective of optimal matching between image regions. We employ the Earth
Mover's Distance (EMD) as a metric to compute a structural distance between
dense image representations to determine image relevance. The EMD generates the
optimal matching flows between structural elements that have the minimum
matching cost, which is used to calculate the image distance for
classification. To generate the important weights of elements in the EMD
formulation, we design a cross-reference mechanism, which can effectively
alleviate the adverse impact caused by the cluttered background and large
intra-class appearance variations. To implement k-shot classification, we
propose to learn a structured fully connected layer that can directly classify
dense image representations with the EMD. Based on the implicit function
theorem, the EMD can be inserted as a layer into the network for end-to-end
training. Our extensive experiments validate the effectiveness of our algorithm
which outperforms state-of-the-art methods by a significant margin on five
widely used few-shot classification benchmarks, namely, miniImageNet,
tieredImageNet, Fewshot-CIFAR100 (FC100), Caltech-UCSD Birds-200-2011 (CUB),
and CIFAR-FewShot (CIFAR-FS). We also demonstrate the effectiveness of our
method on the image retrieval task in our experiments.
Related papers
- Annotation Cost-Efficient Active Learning for Deep Metric Learning Driven Remote Sensing Image Retrieval [3.2109665109975696]
ANNEAL aims to create a small but informative training set made up of similar and dissimilar image pairs.
The informativeness of image pairs is evaluated by combining uncertainty and diversity criteria.
This way of annotating images significantly reduces the annotation cost compared to annotating images with land-use land-cover class labels.
arXiv Detail & Related papers (2024-06-14T15:08:04Z) - MESA: Matching Everything by Segmenting Anything [16.16319526547664]
MESA is a novel approach to establish precise area (or region) matches for efficient matching redundancy reduction.
We show that MESA yields substantial precision improvement for multiple point matchers in indoor and outdoor downstream tasks.
arXiv Detail & Related papers (2024-01-30T04:39:32Z) - Object Detection in Aerial Images in Scarce Data Regimes [0.0]
Small objects, more numerous in aerial images, are the cause for the apparent performance gap between natural and aerial images.
We propose a scale-adaptive box similarity criterion, that improves the training and evaluation of FSOD methods.
We also contribute to generic FSOD with two distinct approaches based on metric learning and fine-tuning.
arXiv Detail & Related papers (2023-10-16T14:16:47Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - A Model-data-driven Network Embedding Multidimensional Features for
Tomographic SAR Imaging [5.489791364472879]
We propose a new model-data-driven network to achieve tomoSAR imaging based on multi-dimensional features.
We add two 2D processing modules, both convolutional encoder-decoder structures, to enhance multi-dimensional features of the imaging scene effectively.
Compared with the conventional CS-based FISTA method and DL-based gamma-Net method, the result of our proposed method has better performance on completeness while having decent imaging accuracy.
arXiv Detail & Related papers (2022-11-28T02:01:43Z) - Image-specific Convolutional Kernel Modulation for Single Image
Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM)
We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels.
Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z) - Transductive Few-Shot Classification on the Oblique Manifold [5.115651633703363]
Few-shot learning attempts to learn with limited data.
In this work, we perform the feature extraction in the Euclidean space.
We also propose a non-parametric Region Self-attention with Spatial Pyramid Pooling.
arXiv Detail & Related papers (2021-08-09T13:01:03Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z) - Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images.
In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.