Seeking the Shape of Sound: An Adaptive Framework for Learning
Voice-Face Association
- URL: http://arxiv.org/abs/2103.07293v1
- Date: Fri, 12 Mar 2021 14:10:48 GMT
- Title: Seeking the Shape of Sound: An Adaptive Framework for Learning
Voice-Face Association
- Authors: Peisong Wen, Qianqian Xu, Yangbangyan Jiang, Zhiyong Yang, Yuan He and
Qingming Huang
- Abstract summary: We propose a novel framework to jointly address the above-mentioned issues.
We introduce a global loss into the modality alignment process.
The proposed method outperforms the previous methods in multiple settings.
- Score: 94.7030305679589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, we have witnessed the early progress on learning the association
between voice and face automatically, which brings a new wave of studies to the
computer vision community. However, most of the prior arts along this line (a)
merely adopt local information to perform modality alignment and (b) ignore the
diversity of learning difficulty across different subjects. In this paper, we
propose a novel framework to jointly address the above-mentioned issues.
Targeting at (a), we propose a two-level modality alignment loss where both
global and local information are considered. Compared with the existing
methods, we introduce a global loss into the modality alignment process. The
global component of the loss is driven by the identity classification.
Theoretically, we show that minimizing the loss could maximize the distance
between embeddings across different identities while minimizing the distance
between embeddings belonging to the same identity, in a global sense (instead
of a mini-batch). Targeting at (b), we propose a dynamic reweighting scheme to
better explore the hard but valuable identities while filtering out the
unlearnable identities. Experiments show that the proposed method outperforms
the previous methods in multiple settings, including voice-face matching,
verification and retrieval.
Related papers
- Pose-Transformation and Radial Distance Clustering for Unsupervised Person Re-identification [5.522856885199346]
Person re-identification (re-ID) aims to tackle the problem of matching identities across non-overlapping cameras.
Supervised approaches require identity information that may be difficult to obtain and are inherently biased towards the dataset they are trained on.
We propose an unsupervised approach to the person re-ID setup. Having zero knowledge of true labels, our proposed method enhances the discriminating ability of the learned features.
arXiv Detail & Related papers (2024-11-06T20:55:30Z) - Feature Diversity Learning with Sample Dropout for Unsupervised Domain
Adaptive Person Re-identification [0.0]
This paper proposes a new approach to learn the feature representation with better generalization ability through limiting noisy pseudo labels.
We put forward a brand-new method referred as to Feature Diversity Learning (FDL) under the classic mutual-teaching architecture.
Experimental results show that our proposed FDL-SD achieves the state-of-the-art performance on multiple benchmark datasets.
arXiv Detail & Related papers (2022-01-25T10:10:48Z) - Learning from Self-Discrepancy via Multiple Co-teaching for Cross-Domain
Person Re-Identification [12.106894735305714]
We propose a multiple co-teaching framework for domain adaptive person re-ID.
Our method achieves competitive performance compared with the state-of-the-arts.
arXiv Detail & Related papers (2021-04-06T03:12:11Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Cross-Modal Generalization: Learning in Low Resource Modalities via
Meta-Alignment [99.29153138760417]
Cross-modal generalization is a learning paradigm to train a model that can quickly perform new tasks in a target modality.
We study a key research question: how can we ensure generalization across modalities despite using separate encoders for different source and target modalities?
Our solution is based on meta-alignment, a novel method to align representation spaces using strongly and weakly paired cross-modal data.
arXiv Detail & Related papers (2020-12-04T19:27:26Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z) - Towards Universal Representation Learning for Deep Face Recognition [106.21744671876704]
We propose a universal representation learning framework that can deal with larger variation unseen in the given training data without leveraging target domain knowledge.
Experiments show that our method achieves top performance on general face recognition datasets such as LFW and MegaFace.
arXiv Detail & Related papers (2020-02-26T23:29:57Z) - Adaptive Deep Metric Embeddings for Person Re-Identification under
Occlusions [17.911512103472727]
We propose a novel person ReID method, which learns the spatial dependencies between the local regions and extracts the discriminative feature representation of the pedestrian image based on Long Short-Term Memory (LSTM)
The proposed loss enables the deep neural network to adaptively learn discriminative metric embeddings, which significantly improve the capability of recognizing unseen person identities.
arXiv Detail & Related papers (2020-02-07T03:18:10Z) - Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal
Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification.
Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.