FaceCoresetNet: Differentiable Coresets for Face Set Recognition
- URL: http://arxiv.org/abs/2308.14075v2
- Date: Wed, 13 Dec 2023 12:29:20 GMT
- Title: FaceCoresetNet: Differentiable Coresets for Face Set Recognition
- Authors: Gil Shapira and Yosi Keller
- Abstract summary: A discriminative descriptor balances two policies when aggregating information from a given set.
This work frames face-set representation as a differentiable coreset selection problem.
We set a new SOTA to set face verification on the IJB-B and IJB-C datasets.
- Score: 16.879093388124964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In set-based face recognition, we aim to compute the most discriminative
descriptor from an unbounded set of images and videos showing a single person.
A discriminative descriptor balances two policies when aggregating information
from a given set. The first is a quality-based policy: emphasizing high-quality
and down-weighting low-quality images. The second is a diversity-based policy:
emphasizing unique images in the set and down-weighting multiple occurrences of
similar images as found in video clips which can overwhelm the set
representation. This work frames face-set representation as a differentiable
coreset selection problem. Our model learns how to select a small coreset of
the input set that balances quality and diversity policies using a learned
metric parameterized by the face quality, optimized end-to-end. The selection
process is a differentiable farthest-point sampling (FPS) realized by
approximating the non-differentiable Argmax operation with differentiable
sampling from the Gumbel-Softmax distribution of distances. The small coreset
is later used as queries in a self and cross-attention architecture to enrich
the descriptor with information from the whole set. Our model is
order-invariant and linear in the input set size. We set a new SOTA to set face
verification on the IJB-B and IJB-C datasets. Our code is publicly available.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Learning Invariant Inter-pixel Correlations for Superpixel Generation [12.605604620139497]
Learnable features exhibit constrained discriminative capability, resulting in unsatisfactory pixel grouping performance.
We propose the Content Disentangle Superpixel algorithm to selectively separate the invariant inter-pixel correlations and statistical properties.
The experimental results on four benchmark datasets demonstrate the superiority of our approach to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-02-28T09:46:56Z) - Interpolating between Images with Diffusion Models [2.6027967363792865]
Interpolating between two input images is a task missing from image generation pipelines.
We propose a method for zero-shot using latent diffusion models.
For greater consistency, or to specify additional criteria, we can generate several candidates and use CLIP to select the highest quality image.
arXiv Detail & Related papers (2023-07-24T07:03:22Z) - Parameter Efficient Local Implicit Image Function Network for Face
Segmentation [13.124513975412254]
Face parsing is defined as the per-pixel labeling of images containing human faces.
We make use of the structural consistency of the human face to propose a lightweight face-parsing method.
arXiv Detail & Related papers (2023-03-27T11:50:27Z) - Describing Sets of Images with Textual-PCA [89.46499914148993]
We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set.
Our procedure is analogous to Principle Component Analysis, in which the role of projection vectors is replaced with generated phrases.
arXiv Detail & Related papers (2022-10-21T17:10:49Z) - Matching Feature Sets for Few-Shot Image Classification [22.84472344406448]
We argue that a set-based representation intrinsically builds a richer representation of images from the base classes.
Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures.
arXiv Detail & Related papers (2022-04-02T22:42:54Z) - Permuted AdaIN: Reducing the Bias Towards Global Statistics in Image
Classification [97.81205777897043]
Recent work has shown that convolutional neural network classifiers overly rely on texture at the expense of shape cues.
We make a similar but different distinction between shape and local image cues, on the one hand, and global image statistics, on the other.
Our method, called Permuted Adaptive Instance Normalization (pAdaIN), reduces the representation of global statistics in the hidden layers of image classifiers.
arXiv Detail & Related papers (2020-10-09T16:38:38Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z) - Multiscale Deep Equilibrium Models [162.15362280927476]
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ)
An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously.
We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
arXiv Detail & Related papers (2020-06-15T18:07:44Z) - RANSAC-Flow: generic two-stage image alignment [53.11926395028508]
We show that a simple unsupervised approach performs surprisingly well across a range of tasks.
Despite its simplicity, our method shows competitive results on a range of tasks and datasets.
arXiv Detail & Related papers (2020-04-03T12:37:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.