Related papers: Joint Multi-Condition Representation Modelling via Matrix Factorisation for Visual Place Recognition

Joint Multi-Condition Representation Modelling via Matrix Factorisation for Visual Place Recognition

URL: http://arxiv.org/abs/2510.17739v1
Date: Mon, 20 Oct 2025 16:50:03 GMT
Title: Joint Multi-Condition Representation Modelling via Matrix Factorisation for Visual Place Recognition
Authors: Timur Ismagilov, Shakaiba Majeed, Michael Milford, Tan Viet Tuyen Nguyen, Sarvapali D. Ramchurn, Shoaib Ehsan,
Abstract summary: We address multi-reference visual place recognition (VPR), where reference sets captured under varying conditions are used to improve localisation performance.<n>We propose a training-free, agnostic approach that jointly models places using multiple reference descriptors via matrix decomposition into basis representations.<n>On multi-appearance data, our method improves Recall@1 by up to 18% over single-reference and outperforms multi-reference baselines across appearance and viewpoint changes.
Score: 14.020214078011515
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address multi-reference visual place recognition (VPR), where reference sets captured under varying conditions are used to improve localisation performance. While deep learning with large-scale training improves robustness, increasing data diversity and model complexity incur extensive computational cost during training and deployment. Descriptor-level fusion via voting or aggregation avoids training, but often targets multi-sensor setups or relies on heuristics with limited gains under appearance and viewpoint change. We propose a training-free, descriptor-agnostic approach that jointly models places using multiple reference descriptors via matrix decomposition into basis representations, enabling projection-based residual matching. We also introduce SotonMV, a structured benchmark for multi-viewpoint VPR. On multi-appearance data, our method improves Recall@1 by up to ~18% over single-reference and outperforms multi-reference baselines across appearance and viewpoint changes, with gains of ~5% on unstructured data, demonstrating strong generalisation while remaining lightweight.

Related papers

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation [61.64052577026623]
Real-world multi-view datasets are often heterogeneous and imperfect.<n>We propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment.<n>Our RML is self-supervised and can also be applied for downstream tasks as a regularization.
arXiv Detail & Related papers (2025-03-06T07:01:08Z)
Structure-guided Deep Multi-View Clustering [13.593229506936682]
Deep multi-view clustering seeks to utilize the abundant information from multiple views to improve clustering performance.<n>Most of the existing clustering methods often neglect to fully mine multi-view structural information.<n>We propose a structure-guided deep multi-view clustering model to explore the distribution of multi-view data.
arXiv Detail & Related papers (2025-01-17T12:42:30Z)
Balanced Multi-view Clustering [56.17836963920012]
Multi-view clustering (MvC) aims to integrate information from different views to enhance the capability of the model in capturing the underlying data structures.<n>The widely used joint training paradigm in MvC is potentially not fully leverage the multi-view information.<n>We propose a novel balanced multi-view clustering (BMvC) method, which introduces a view-specific contrastive regularization (VCR) to modulate the optimization of each view.
arXiv Detail & Related papers (2025-01-05T14:42:47Z)
One for all: A novel Dual-space Co-training baseline for Large-scale Multi-View Clustering [42.92751228313385]
We propose a novel multi-view clustering model, named Dual-space Co-training Large-scale Multi-view Clustering (DSCMC) The main objective of our approach is to enhance the clustering performance by leveraging co-training in two distinct spaces. Our algorithm has an approximate linear computational complexity, which guarantees its successful application on large-scale datasets.
arXiv Detail & Related papers (2024-01-28T16:30:13Z)
ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer [13.0858576267115]
We present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. ClusVPR introduces a unique paradigm called Clustering-based weighted Transformer Network (CWTNet) We also introduce the optimized-VLAD layer that significantly reduces the number of parameters and enhances model efficiency.
arXiv Detail & Related papers (2023-10-06T09:01:15Z)
Semi-supervised multi-view concept decomposition [30.699496411869834]
Concept Factorization (CF) has demonstrated superior performance in multi-view clustering tasks. We propose a novel semi-supervised multi-view concept factorization model, named SMVCF. We conduct experiments on four diverse datasets to evaluate the performance of SMVCF.
arXiv Detail & Related papers (2023-07-03T10:50:44Z)
Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint. We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution. We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z)
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training [88.80694147730883]
We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Our approach outperforms vanilla CLIP by 1.6 points in linear probing on a collection of 24 downstream vision tasks.
arXiv Detail & Related papers (2022-07-26T05:19:16Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization. Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.