Related papers: Enhancing Self-Supervised Learning with Semantic Pairs A New Dataset and Empirical Study

Enhancing Self-Supervised Learning with Semantic Pairs A New Dataset and Empirical Study

URL: http://arxiv.org/abs/2510.08722v2
Date: Mon, 13 Oct 2025 09:09:06 GMT
Title: Enhancing Self-Supervised Learning with Semantic Pairs A New Dataset and Empirical Study
Authors: Mohammad Alkhalefi, Georgios Leontidis, Mingjun Zhong,
Abstract summary: Instance discrimination is a self-supervised representation learning paradigm wherein individual instances within a dataset are treated as distinct classes.<n>We provide the technical foundation for leveraging semantic pairs to enhance the generalizability of the model's representation.
Score: 2.4405762029252465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instance discrimination is a self-supervised representation learning paradigm wherein individual instances within a dataset are treated as distinct classes. This is typically achieved by generating two disparate views of each instance by applying stochastic transformations, encouraging the model to learn representations invariant to the common underlying object across these views. While this approach facilitates the acquisition of invariant representations for dataset instances under various handcrafted transformations (e.g., random cropping, colour jittering), an exclusive reliance on such data transformations for achieving invariance may inherently limit the model's generalizability to unseen datasets and diverse downstream tasks. The inherent limitation stems from the fact that the finite set of transformations within the data processing pipeline is unable to encompass the full spectrum of potential data variations. In this study, we provide the technical foundation for leveraging semantic pairs to enhance the generalizability of the model's representation and empirically demonstrate that incorporating semantic pairs mitigates the issue of limited transformation coverage. Specifically, we propose that by exposing the model to semantic pairs (i.e., two instances belonging to the same semantic category), we introduce varied real-world scene contexts, thereby fostering the development of more generalizable object representations. To validate this hypothesis, we constructed and released a novel dataset comprising curated semantic pairs and conducted extensive experimentation to empirically establish that their inclusion enables the model to learn more general representations, ultimately leading to improved performance across diverse downstream tasks.

Related papers

Exploring Transferable Homogeneous Groups for Compositional Zero-Shot Learning [10.687828416652929]
Homogeneous Group Representation Learning (HGRL) is a new perspective formulates state (object) representation learning as multiple homogeneous sub-group representation learning.<n>Our method integrates three core components designed to simultaneously enhance both the visual and prompt representation capabilities of the model.
arXiv Detail & Related papers (2025-01-18T08:19:48Z)
Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data. We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z)
In-Context Symmetries: Self-Supervised Learning through Contextual World Models [41.61360016455319]
We propose to learn a general representation that can adapt to be invariant or equivariant to different transformations by paying attention to context. Our proposed algorithm, Contextual Self-Supervised Learning (ContextSSL), learns equivariance to all transformations.
arXiv Detail & Related papers (2024-05-28T14:03:52Z)
Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.<n>Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
arXiv Detail & Related papers (2024-05-26T13:11:55Z)
A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. We present a generative latent variable model for self-supervised learning. We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z)
Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z)
Domain Generalization In Robust Invariant Representation [10.132611239890345]
In this paper, we investigate the generalization of invariant representations on out-of-distribution data. We show that the invariant model learns unstructured latent representations that are robust to distribution shifts.
arXiv Detail & Related papers (2023-04-07T00:58:30Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
Not All Instances Contribute Equally: Instance-adaptive Class Representation Learning for Few-Shot Visual Recognition [94.04041301504567]
Few-shot visual recognition refers to recognize novel visual concepts from a few labeled instances. We propose a novel metric-based meta-learning framework termed instance-adaptive class representation learning network (ICRL-Net) for few-shot visual recognition.
arXiv Detail & Related papers (2022-09-07T10:00:18Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
Discriminative Multimodal Learning via Conditional Priors in Generative Models [21.166519800652047]
This research studies the realistic scenario in which all modalities and class labels are available for model training. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities.
arXiv Detail & Related papers (2021-10-09T17:22:24Z)
Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style [32.20957709045773]
We formulate the augmentation process as a latent variable model. We study the identifiability of the latent representation based on pairs of views of the observations. We introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies.
arXiv Detail & Related papers (2021-06-08T18:18:09Z)
Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets. Part of the challenge of learning robust models lies in the influence of unobserved confounders. We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.