Automatic Face Understanding: Recognizing Families in Photos
- URL: http://arxiv.org/abs/2102.08941v1
- Date: Sun, 10 Jan 2021 22:37:25 GMT
- Title: Automatic Face Understanding: Recognizing Families in Photos
- Authors: Joseph P Robinson
- Abstract summary: We build the largest database for kinship recognition.
Video dynamics, audio, and text captions can be used in the decision making of kinship recognition systems.
- Score: 6.131589026706621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We built the largest database for kinship recognition. The data were labeled
using a novel clustering algorithm that used label proposals as side
information to guide more accurate clusters. Great savings in time and human
input was had. Statistically, FIW shows enormous gains over its predecessors.
We have several benchmarks in kinship verification, family classification,
tri-subject verification, and large-scale search and retrieval. We also trained
CNNs on FIW and deployed the model on the renowned KinWild I and II to gain
SOTA. Most recently, we further augmented FIW with MM. Now, video dynamics,
audio, and text captions can be used in the decision making of kinship
recognition systems. We expect FIW will significantly impact research and
reality. Additionally, we tackled the classic problem of facial landmark
localization. A majority of these networks have objectives based on L1 or L2
norms, which inherit several disadvantages. The locations of landmarks are
determined from generated heatmaps from which predicted landmark locations get
penalized without accounting for the spread: a high scatter corresponds to low
confidence and vice-versa. To address this, we introduced an objective that
penalizes for low confidence. Another issue is a dependency on labeled data,
which is expensive to collect and susceptible to error. We addressed both
issues by proposing an adversarial training framework that leverages unlabeled
data to improve model performance. Our method claims SOTA on renowned
benchmarks. Furthermore, our model is robust with a reduced size: 1/8 the
number of channels is comparable to SOTA in real-time on a CPU. Finally, we
built BFW to serve as a proxy to measure bias across ethnicity and gender
subgroups, allowing us to characterize FR performances per subgroup. We show
performances are non-optimal when a single threshold is used to determine
whether sample pairs are genuine.
Related papers
- Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes [19.987151025364067]
This paper presents a new semi-supervised method for training a reliable crowd counting model.
We foster the model's intrinsic'subitizing' capability, which allows it to accurately estimate the count in regions.
Our method achieves the state-of-the-art performance, surpassing previous approaches by a large margin on challenging benchmarks.
arXiv Detail & Related papers (2023-10-16T12:42:43Z) - Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases [62.54519787811138]
We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues.
We rank images within their classes based on spuriosity, proxied via deep neural features of an interpretable network.
Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.
arXiv Detail & Related papers (2022-12-05T23:15:43Z) - Incorporating Crowdsourced Annotator Distributions into Ensemble
Modeling to Improve Classification Trustworthiness for Ancient Greek Papyri [3.870354915766567]
Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling.
The application of ensemble modeling to such datasets can help identify images where the ground-truth is questionable and quantify the trustworthiness of those samples.
arXiv Detail & Related papers (2022-10-28T19:39:14Z) - Securing Federated Learning against Overwhelming Collusive Attackers [7.587927338603662]
We propose two graph theoretic algorithms, based on Minimum Spanning Tree and k-Densest graph, by leveraging correlations between local models.
Our FL model can nullify the influence of attackers even when they are up to 70% of all the clients.
We establish the superiority of our algorithms over the existing ones using accuracy, attack success rate, and early detection round.
arXiv Detail & Related papers (2022-09-28T13:41:04Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Newer is not always better: Rethinking transferability metrics, their
peculiarities, stability and performance [5.650647159993238]
Fine-tuning of large pre-trained image and language models on small customized datasets has become increasingly popular.
We show that the statistical problems with covariance estimation drive the poor performance of H-score.
We propose a correction and recommend measuring correlation performance against relative accuracy in such settings.
arXiv Detail & Related papers (2021-10-13T17:24:12Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - Video-based Person Re-identification without Bells and Whistles [49.51670583977911]
Video-based person re-identification (Re-ID) aims at matching the video tracklets with cropped video frames for identifying the pedestrians under different cameras.
There exists severe spatial and temporal misalignment for those cropped tracklets due to the imperfect detection and tracking results generated with obsolete methods.
We present a simple re-Detect and Link (DL) module which can effectively reduce those unexpected noise through applying the deep learning-based detection and tracking on the cropped tracklets.
arXiv Detail & Related papers (2021-05-22T10:17:38Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Improving Semi-supervised Federated Learning by Reducing the Gradient
Diversity of Models [67.66144604972052]
Federated learning (FL) is a promising way to use the computing power of mobile devices while maintaining privacy of users.
We show that a critical issue that affects the test accuracy is the large gradient diversity of the models from different users.
We propose a novel grouping-based model averaging method to replace the FedAvg averaging method.
arXiv Detail & Related papers (2020-08-26T03:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.