PADS: Policy-Adapted Sampling for Visual Similarity Learning
- URL: http://arxiv.org/abs/2003.11113v2
- Date: Sat, 28 Mar 2020 12:56:16 GMT
- Title: PADS: Policy-Adapted Sampling for Visual Similarity Learning
- Authors: Karsten Roth, Timo Milbich, Bj\"orn Ommer
- Abstract summary: Learning visual similarity requires learning relations, typically between triplets of images.
Currently, the prominent paradigm are fixed or curriculum sampling strategies that are predefined before training starts.
We employ reinforcement learning and have a teacher network adjust the sampling distribution based on the current state of the learner network.
- Score: 19.950682531209154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning visual similarity requires to learn relations, typically between
triplets of images. Albeit triplet approaches being powerful, their
computational complexity mostly limits training to only a subset of all
possible training triplets. Thus, sampling strategies that decide when to use
which training sample during learning are crucial. Currently, the prominent
paradigm are fixed or curriculum sampling strategies that are predefined before
training starts. However, the problem truly calls for a sampling process that
adjusts based on the actual state of the similarity representation during
training. We, therefore, employ reinforcement learning and have a teacher
network adjust the sampling distribution based on the current state of the
learner network, which represents visual similarity. Experiments on benchmark
datasets using standard triplet-based losses show that our adaptive sampling
strategy significantly outperforms fixed sampling strategies. Moreover,
although our adaptive sampling is only applied on top of basic triplet-learning
frameworks, we reach competitive results to state-of-the-art approaches that
employ diverse additional learning signals or strong ensemble architectures.
Code can be found under https://github.com/Confusezius/CVPR2020_PADS.
Related papers
- Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs [73.74375912785689]
This paper proposes unified training strategies for speech recognition systems.
We demonstrate that training a single model for all three tasks enhances VSR and AVSR performance.
We also introduce a greedy pseudo-labelling approach to more effectively leverage unlabelled samples.
arXiv Detail & Related papers (2024-11-04T16:46:53Z) - Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks [15.456986824737067]
Stereo matching task relies on expensive airborne LiDAR data.
In this paper, we study key training factors from three perspectives.
We present an unsupervised stereo matching network with good generalization performance.
arXiv Detail & Related papers (2024-08-14T15:26:10Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Sampling Through the Lens of Sequential Decision Making [9.101505546901999]
We propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR)
Our approach optimally adjusts the sampling process to achieve optimal performance.
Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets.
arXiv Detail & Related papers (2022-08-17T04:01:29Z) - Partner-Assisted Learning for Few-Shot Image Classification [54.66864961784989]
Few-shot Learning has been studied to mimic human visual capabilities and learn effective models without the need of exhaustive human annotation.
In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples.
We propose a two-stage training scheme, which first trains a partner encoder to model pair-wise similarities and extract features serving as soft-anchors, and then trains a main encoder by aligning its outputs with soft-anchors while attempting to maximize classification performance.
arXiv Detail & Related papers (2021-09-15T22:46:19Z) - Improving speech recognition models with small samples for air traffic
control systems [9.322392779428505]
In this work, a novel training approach based on pretraining and transfer learning is proposed to address the issue of small training samples.
Three real ATC datasets are used to validate the proposed ASR model and training strategies.
The experimental results demonstrate that the ASR performance is significantly improved on all three datasets.
arXiv Detail & Related papers (2021-02-16T08:28:52Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z) - DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning [83.48587570246231]
Visual Similarity plays an important role in many computer vision applications.
Deep metric learning (DML) is a powerful framework for learning such similarities.
We propose and study multiple complementary learning tasks, targeting conceptually different data relationships.
We learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance.
arXiv Detail & Related papers (2020-04-28T12:26:50Z) - Dynamic Sampling for Deep Metric Learning [7.010669841466896]
Deep metric learning maps visually similar images onto nearby locations and visually dissimilar images apart from each other in an embedding manifold.
A dynamic sampling strategy is proposed to organize the training pairs in an easy-to-hard order to feed into the network.
It allows the network to learn general boundaries between categories from the easy training pairs at its early stages and finalize the details of the model mainly relying on the hard training samples in the later.
arXiv Detail & Related papers (2020-04-24T09:47:23Z) - Efficient Deep Representation Learning by Adaptive Latent Space Sampling [16.320898678521843]
Supervised deep learning requires a large amount of training samples with annotations, which are expensive and time-consuming to obtain.
We propose a novel training framework which adaptively selects informative samples that are fed to the training process.
arXiv Detail & Related papers (2020-03-19T22:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.