Related papers: Self-Organizing Visual Prototypes for Non-Parametric Representation Learning

Self-Organizing Visual Prototypes for Non-Parametric Representation Learning

URL: http://arxiv.org/abs/2505.21533v1
Date: Fri, 23 May 2025 20:12:07 GMT
Title: Self-Organizing Visual Prototypes for Non-Parametric Representation Learning
Authors: Thalles Silva, Helio Pedrini, Adín Ramírez Rivera,
Abstract summary: We present Self-Organizing Visual Prototypes (SOP), a new training technique for unsupervised visual feature learning.<n>In this strategy, a prototype is represented by many semantically similar representations, or support embeddings (SEs), each containing a complementary set of features.<n>We evaluate the representations learned using the SOP strategy on a range of benchmarks, including retrieval, linear evaluation, fine-tuning, and object detection.
Score: 6.096888891865663
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Self-Organizing Visual Prototypes (SOP), a new training technique for unsupervised visual feature learning. Unlike existing prototypical self-supervised learning (SSL) methods that rely on a single prototype to encode all relevant features of a hidden cluster in the data, we propose the SOP strategy. In this strategy, a prototype is represented by many semantically similar representations, or support embeddings (SEs), each containing a complementary set of features that together better characterize their region in space and maximize training performance. We reaffirm the feasibility of non-parametric SSL by introducing novel non-parametric adaptations of two loss functions that implement the SOP strategy. Notably, we introduce the SOP Masked Image Modeling (SOP-MIM) task, where masked representations are reconstructed from the perspective of multiple non-parametric local SEs. We comprehensively evaluate the representations learned using the SOP strategy on a range of benchmarks, including retrieval, linear evaluation, fine-tuning, and object detection. Our pre-trained encoders achieve state-of-the-art performance on many retrieval benchmarks and demonstrate increasing performance gains with more complex encoders.

Related papers

CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks [76.00315860962885]
We propose CLASP (CLIP-guided Adaptable Self-suPervised learning), a novel framework for unsupervised pre-training in human-centric visual tasks.<n> CLASP leverages the powerful vision-language model CLIP to generate both low-level (e.g., body parts) and high-level (e.g., attributes) semantic pseudo-labels.<n>MoE dynamically adapts feature extraction based on task-specific prompts, mitigating potential feature conflicts and enhancing transferability.
arXiv Detail & Related papers (2026-01-19T15:19:28Z)
Few-Shot Inspired Generative Zero-Shot Learning [14.66239393852298]
Generative zero-shot learning (ZSL) methods typically synthesize visual features for unseen classes.<n>We propose FSIGenZ, a few-shot-inspired generative ZSL framework that reduces reliance on large-scale feature synthesis.<n>Experiments on SUN, AwA2, and CUB benchmarks demonstrate that FSIGenZ achieves competitive performance using far fewer synthetic features.
arXiv Detail & Related papers (2025-06-18T02:39:36Z)
Topology-Aware CLIP Few-Shot Learning [0.0]
We introduce a topology-aware tuning approach integrating Representation Topology Divergence into the Task Residual framework.<n>By explicitly aligning the topological structures of visual and text representations using a combined RTD and Cross-Entropy loss, our method enhances few-shot performance.
arXiv Detail & Related papers (2025-05-03T04:58:29Z)
CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass [3.0566617373924325]
Recent advances in pre-trained language models (PLMs) have driven remarkable progress in this field.<n>We propose CSE-SFP, an innovative method that exploits the structural characteristics of generative models.<n>We show that CSE-SFP not only produces higher-quality embeddings but also significantly reduces both training time and memory consumption.
arXiv Detail & Related papers (2025-05-01T08:27:14Z)
Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning [13.68867780184022]
Few-shot learning aims to recognize new concepts using a limited number of visual samples. Our framework incorporates both the abstract class semantics and the concrete class entities extracted from Large Language Models (LLMs) For the challenging one-shot setting, our approach, utilizing the ResNet-12 backbone, achieves an average improvement of 1.95% over the second-best competitor.
arXiv Detail & Related papers (2024-08-22T15:10:20Z)
Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding [39.424931953675994]
Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data. This study endeavours to evaluate the effectiveness of pure self-supervised learning (SSL) techniques in computer vision tasks.
arXiv Detail & Related papers (2023-08-22T13:55:57Z)
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks. We propose a single-stage and standalone method, MOCA, which unifies both desired properties. We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
Dual Prototypical Contrastive Learning for Few-shot Semantic Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task. The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space. We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method. PCL implicitly encodes semantic structures of the data into the learned embedding space. PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.