Large-scale Fully-Unsupervised Re-Identification
- URL: http://arxiv.org/abs/2307.14278v1
- Date: Wed, 26 Jul 2023 16:19:19 GMT
- Title: Large-scale Fully-Unsupervised Re-Identification
- Authors: Gabriel Bertocco, Fernanda Andal\'o, Terrance E. Boult, and Anderson
Rocha
- Abstract summary: We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
- Score: 78.47108158030213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fully-unsupervised Person and Vehicle Re-Identification have received
increasing attention due to their broad applicability in surveillance,
forensics, event understanding, and smart cities, without requiring any manual
annotation. However, most of the prior art has been evaluated in datasets that
have just a couple thousand samples. Such small-data setups often allow the use
of costly techniques in time and memory footprints, such as Re-Ranking, to
improve clustering results. Moreover, some previous work even pre-selects the
best clustering hyper-parameters for each dataset, which is unrealistic in a
large-scale fully-unsupervised scenario. In this context, this work tackles a
more realistic scenario and proposes two strategies to learn from large-scale
unlabeled data. The first strategy performs a local neighborhood sampling to
reduce the dataset size in each iteration without violating neighborhood
relationships. A second strategy leverages a novel Re-Ranking technique, which
has a lower time upper bound complexity and reduces the memory complexity from
O(n^2) to O(kn) with k << n. To avoid the pre-selection of specific
hyper-parameter values for the clustering algorithm, we also present a novel
scheduling algorithm that adjusts the density parameter during training, to
leverage the diversity of samples and keep the learning robust to noisy
labeling. Finally, due to the complementary knowledge learned by different
models, we also introduce a co-training strategy that relies upon the
permutation of predicted pseudo-labels, among the backbones, with no need for
any hyper-parameters or weighting optimization. The proposed methodology
outperforms the state-of-the-art methods in well-known benchmarks and in the
challenging large-scale Veri-Wild dataset, with a faster and memory-efficient
Re-Ranking strategy, and a large-scale, noisy-robust, and ensemble-based
learning approach.
Related papers
- Boosting K-means for Big Data by Fusing Data Streaming with Global Optimization [0.3069335774032178]
K-means clustering is a cornerstone of data mining, but its efficiency deteriorates when confronted with massive datasets.
We propose a novel algorithm that leverages the Variable Neighborhood Search (VNS) metaheuristic to optimize K-means clustering for big data.
arXiv Detail & Related papers (2024-10-18T15:43:34Z) - Transfer Learning in $\ell_1$ Regularized Regression: Hyperparameter
Selection Strategy based on Sharp Asymptotic Analysis [4.178980693837599]
Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset.
Some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso.
We conduct a thorough, precise study of the algorithm in a high-dimensional setting via an analysis using the replica method.
Our approach reveals a surprisingly simple behavior of the algorithm: Ignoring one of the two types of information transferred to the fine-tuning stage has little effect on generalization performance.
arXiv Detail & Related papers (2024-09-26T10:20:59Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Transductive Few-Shot Learning: Clustering is All You Need? [31.21306826132773]
We investigate a general formulation for transive few-shot learning, which integrates prototype-based objectives.
We find that our method yields competitive performances, in term of accuracy and optimization, while scaling up to large problems.
Surprisingly, we find that our general model already achieve competitive performances in comparison to the state-of-the-art learning.
arXiv Detail & Related papers (2021-06-16T16:14:01Z) - Unsupervised and self-adaptative techniques for cross-domain person
re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task.
Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation.
In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z) - SDCOR: Scalable Density-based Clustering for Local Outlier Detection in
Massive-Scale Datasets [0.0]
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets.
Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity.
arXiv Detail & Related papers (2020-06-13T11:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.