Represent, Compare, and Learn: A Similarity-Aware Framework for
Class-Agnostic Counting
- URL: http://arxiv.org/abs/2203.08354v1
- Date: Wed, 16 Mar 2022 02:24:25 GMT
- Title: Represent, Compare, and Learn: A Similarity-Aware Framework for
Class-Agnostic Counting
- Authors: Min Shi, Hao Lu, Chen Feng, Chengxin Liu, Zhiguo Cao
- Abstract summary: Class-agnostic counting aims to count all instances in a query image given few exemplars.
Existing methods either adopt a pretrained network to represent features or learn a new one.
We propose a similarity-aware CAC framework that jointly learns representation and similarity metric.
- Score: 30.34585324943777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Class-agnostic counting (CAC) aims to count all instances in a query image
given few exemplars. A standard pipeline is to extract visual features from
exemplars and match them with query images to infer object counts. Two
essential components in this pipeline are feature representation and similarity
metric. Existing methods either adopt a pretrained network to represent
features or learn a new one, while applying a naive similarity metric with
fixed inner product. We find this paradigm leads to noisy similarity matching
and hence harms counting performance. In this work, we propose a
similarity-aware CAC framework that jointly learns representation and
similarity metric. We first instantiate our framework with a naive baseline
called Bilinear Matching Network (BMNet), whose key component is a learnable
bilinear similarity metric. To further embody the core of our framework, we
extend BMNet to BMNet+ that models similarity from three aspects: 1)
representing the instances via their self-similarity to enhance feature
robustness against intra-class variations; 2) comparing the similarity
dynamically to focus on the key patterns of each exemplar; 3) learning from a
supervision signal to impose explicit constraints on matching results.
Extensive experiments on a recent CAC dataset FSC147 show that our models
significantly outperform state-of-the-art CAC approaches. In addition, we also
validate the cross-dataset generality of BMNet and BMNet+ on a car counting
dataset CARPK. Code is at tiny.one/BMNet
Related papers
- The Triangle of Similarity: A Multi-Faceted Framework for Comparing Neural Network Representations [5.415604247164019]
We propose the Triangle of Similarity, a framework that combines three complementary perspectives.<n> architectural family is a primary determinant of representational similarity, forming distinct clusters.<n>For some model pairs, pruning appears to regularize representations, exposing a shared computational core.
arXiv Detail & Related papers (2026-01-23T12:15:43Z) - Improving Contrastive Learning for Referring Expression Counting [35.979549843591926]
C-REX is a novel contrastive learning framework based on supervised contrastive learning.<n>It operates entirely within the image space, avoiding the misalignment issues of image-text contrastive learning.<n>C-REX achieves state-of-the-art results in Referring Expression Counting.
arXiv Detail & Related papers (2025-05-28T20:33:42Z) - Soft Neighbors are Positive Supporters in Contrastive Visual
Representation Learning [35.53729744330751]
Contrastive learning methods train visual encoders by comparing views from one instance to others.
This binary instance discrimination is studied extensively to improve feature representations in self-supervised learning.
In this paper, we rethink the instance discrimination framework and find the binary instance labeling insufficient to measure correlations between different samples.
arXiv Detail & Related papers (2023-03-30T04:22:07Z) - Not All Instances Contribute Equally: Instance-adaptive Class
Representation Learning for Few-Shot Visual Recognition [94.04041301504567]
Few-shot visual recognition refers to recognize novel visual concepts from a few labeled instances.
We propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition.
arXiv Detail & Related papers (2022-09-07T10:00:18Z) - Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images.
Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph.
Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Multi-similarity based Hyperrelation Network for few-shot segmentation [2.306100133614193]
Few-shot semantic segmentation aims at recognizing the object regions of unseen categories with only a few examples as supervision.
We propose an effective Multi-similarity Hyperrelation Network (MSHNet) to tackle the few-shot semantic segmentation problem.
arXiv Detail & Related papers (2022-03-17T18:16:52Z) - Contextualizing Meta-Learning via Learning to Decompose [125.76658595408607]
We propose Learning to Decompose Network (LeadNet) to contextualize the meta-learned support-to-target'' strategy.
LeadNet learns to automatically select the strategy associated with the right via incorporating the change of comparison across contexts with polysemous embeddings.
arXiv Detail & Related papers (2021-06-15T13:10:56Z) - Unsupervised Feature Learning by Cross-Level Instance-Group
Discrimination [68.83098015578874]
We integrate between-instance similarity into contrastive learning, not directly by instance grouping, but by cross-level discrimination.
CLD effectively brings unsupervised learning closer to natural data and real-world applications.
New state-of-the-art on self-supervision, semi-supervision, and transfer learning benchmarks, and beats MoCo v2 and SimCLR on every reported performance.
arXiv Detail & Related papers (2020-08-09T21:13:13Z) - ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image
Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet)
Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms.
Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z) - Memory-Augmented Relation Network for Few-Shot Learning [114.47866281436829]
In this work, we investigate a new metric-learning method, Memory-Augmented Relation Network (MRN)
In MRN, we choose the samples that are visually similar from the working context, and perform weighted information propagation to attentively aggregate helpful information from chosen ones to enhance its representation.
We empirically demonstrate that MRN yields significant improvement over its ancestor and achieves competitive or even better performance when compared with other few-shot learning approaches.
arXiv Detail & Related papers (2020-05-09T10:09:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.