Relation-Aware Distribution Representation Network for Person Clustering
with Multiple Modalities
- URL: http://arxiv.org/abs/2308.00588v1
- Date: Tue, 1 Aug 2023 15:04:56 GMT
- Title: Relation-Aware Distribution Representation Network for Person Clustering
with Multiple Modalities
- Authors: Kaijian Liu, Shixiang Tang, Ziyue Li, Zhishuai Li, Lei Bai, Feng Zhu,
Rui Zhao
- Abstract summary: Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks.
We propose a Relation-Aware Distribution representation Network (RAD-Net) to generate a distribution representation for multi-modal clues.
Our method achieves substantial improvements of +6% and +8.2% in F-score on the Video Person-Clustering dataset.
- Score: 17.569843539515734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Person clustering with multi-modal clues, including faces, bodies, and
voices, is critical for various tasks, such as movie parsing and identity-based
movie editing. Related methods such as multi-view clustering mainly project
multi-modal features into a joint feature space. However, multi-modal clue
features are usually rather weakly correlated due to the semantic gap from the
modality-specific uniqueness. As a result, these methods are not suitable for
person clustering. In this paper, we propose a Relation-Aware Distribution
representation Network (RAD-Net) to generate a distribution representation for
multi-modal clues. The distribution representation of a clue is a vector
consisting of the relation between this clue and all other clues from all
modalities, thus being modality agnostic and good for person clustering.
Accordingly, we introduce a graph-based method to construct distribution
representation and employ a cyclic update policy to refine distribution
representation progressively. Our method achieves substantial improvements of
+6% and +8.2% in F-score on the Video Person-Clustering Dataset (VPCD) and
VoxCeleb2 multi-view clustering dataset, respectively. Codes will be released
publicly upon acceptance.
Related papers
- Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - One-step Multi-view Clustering with Diverse Representation [47.41455937479201]
We propose a one-step multi-view clustering with diverse representation method, which incorporates multi-view learning and $k$-means into a unified framework.
We develop an efficient optimization algorithm with proven convergence to solve the resultant problem.
arXiv Detail & Related papers (2023-06-08T02:52:24Z) - A Clustering-guided Contrastive Fusion for Multi-view Representation
Learning [7.630965478083513]
We propose a deep fusion network to fuse view-specific representations into the view-common representation.
We also design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation.
In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors.
arXiv Detail & Related papers (2022-12-28T07:21:05Z) - MORI-RAN: Multi-view Robust Representation Learning via Hybrid
Contrastive Fusion [4.36488705757229]
Multi-view representation learning is essential for many multi-view tasks, such as clustering and classification.
We propose a hybrid contrastive fusion algorithm to extract robust view-common representation from unlabeled data.
Experimental results demonstrated that the proposed method outperforms 12 competitive multi-view methods on four real-world datasets.
arXiv Detail & Related papers (2022-08-26T09:58:37Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering.
Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z) - Face, Body, Voice: Video Person-Clustering with Multiple Modalities [85.0282742801264]
Previous methods focus on the narrower task of face-clustering.
Most current datasets evaluate only the task of face-clustering, rather than person-clustering.
We introduce a Video Person-Clustering dataset, for evaluating multi-modal person-clustering.
arXiv Detail & Related papers (2021-05-20T17:59:40Z) - Deep Incomplete Multi-View Multiple Clusterings [41.43164409639238]
We introduce a deep incomplete multi-view multiple clusterings framework, which achieves the completion of data view and multiple shared representations simultaneously.
Experiments on benchmark datasets confirm that DiMVMC outperforms the state-of-the-art competitors in generating multiple clusterings with high diversity and quality.
arXiv Detail & Related papers (2020-10-02T08:01:24Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z) - Generative Partial Multi-View Clustering [133.36721417531734]
We propose a generative partial multi-view clustering model, named as GP-MVC, to address the incomplete multi-view problem.
First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the consistent cluster structure across multiple views.
Second, view-specific generative adversarial networks are developed to generate the missing data of one view conditioning on the shared representation given by other views.
arXiv Detail & Related papers (2020-03-29T17:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.