Cluster and Aggregate: Face Recognition with Large Probe Set
- URL: http://arxiv.org/abs/2210.10864v1
- Date: Wed, 19 Oct 2022 20:01:15 GMT
- Title: Cluster and Aggregate: Face Recognition with Large Probe Set
- Authors: Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu
- Abstract summary: We propose a two-stage feature fusion paradigm, Cluster and Aggregate, that can both scale to large $N$ and maintain the ability to perform sequential inference with order invariance.
Experiments on IJB-B and IJB-S benchmark datasets show the superiority of the proposed two-stage paradigm in unconstrained face recognition.
- Score: 18.662943303044315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature fusion plays a crucial role in unconstrained face recognition where
inputs (probes) comprise of a set of $N$ low quality images whose individual
qualities vary. Advances in attention and recurrent modules have led to feature
fusion that can model the relationship among the images in the input set.
However, attention mechanisms cannot scale to large $N$ due to their quadratic
complexity and recurrent modules suffer from input order sensitivity. We
propose a two-stage feature fusion paradigm, Cluster and Aggregate, that can
both scale to large $N$ and maintain the ability to perform sequential
inference with order invariance. Specifically, Cluster stage is a linear
assignment of $N$ inputs to $M$ global cluster centers, and Aggregation stage
is a fusion over $M$ clustered features. The clustered features play an
integral role when the inputs are sequential as they can serve as a
summarization of past features. By leveraging the order-invariance of
incremental averaging operation, we design an update rule that achieves
batch-order invariance, which guarantees that the contributions of early image
in the sequence do not diminish as time steps increase. Experiments on IJB-B
and IJB-S benchmark datasets show the superiority of the proposed two-stage
paradigm in unconstrained face recognition. Code and pretrained models are
available in https://github.com/mk-minchul/caface
Related papers
- Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition [57.74076383449153]
We propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++.
It models two common event representations simultaneously, i.e., event images and event voxels.
We achieve new state-of-the-art performance on the Bullying10k dataset, i.e., $90.51%$, which exceeds the second place by $+2.21%$.
arXiv Detail & Related papers (2024-06-27T02:32:46Z) - TOPIC: A Parallel Association Paradigm for Multi-Object Tracking under
Complex Motions and Diverse Scenes [17.913501787851356]
We introduce a new dataset called BEE23 to highlight complex motions.
We propose a parallel paradigm and present the Two rOund Parallel matchIng meChanism (TOPIC) to implement it.
Our approach achieves state-of-the-art performance on four public datasets and BEE23.
arXiv Detail & Related papers (2023-08-22T03:30:22Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - Unsupervised Gait Recognition with Selective Fusion [10.414364995179556]
We propose a new task: Unsupervised Gait Recognition (UGR)
We introduce a new cluster-based baseline to solve UGR with cluster-level contrastive learning.
We propose a Selective Fusion method, which includes Selective Cluster Fusion (SCF) and Selective Sample Fusion (SSF)
arXiv Detail & Related papers (2023-03-19T21:34:20Z) - HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot
Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition.
The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations.
We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - API: Boosting Multi-Agent Reinforcement Learning via
Agent-Permutation-Invariant Networks [35.63476630248861]
Multi-agent reinforcement learning suffers from poor sample efficiency due to the exponential growth of the state-action space.
We propose two novel designs to achieve permutation invariant (PI)
The first design permutes the same but differently ordered inputs back to the same order and the downstream networks only need to learn function mapping over fixed-ordering inputs.
arXiv Detail & Related papers (2022-03-10T11:00:53Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - $O(n)$ Connections are Expressive Enough: Universal Approximability of
Sparse Transformers [71.31712741938837]
We show that sparse Transformers with only $O(n)$ connections per attention layer can approximate the same function class as the dense model with $n2$ connections.
We also present experiments comparing different patterns/levels of sparsity on standard NLP tasks.
arXiv Detail & Related papers (2020-06-08T18:30:12Z) - GATCluster: Self-Supervised Gaussian-Attention Network for Image
Clustering [9.722607434532883]
We propose a self-supervised clustering network for image Clustering (GATCluster)
Rather than extracting intermediate features first and then performing the traditional clustering, GATCluster semantic cluster labels without further post-processing.
We develop a two-step learning algorithm that is memory-efficient for clustering large-size images.
arXiv Detail & Related papers (2020-02-27T00:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.