Related papers: Dual Perspectives on Non-Contrastive Self-Supervised Learning

Dual Perspectives on Non-Contrastive Self-Supervised Learning

URL: http://arxiv.org/abs/2507.01028v1
Date: Wed, 18 Jun 2025 07:46:51 GMT
Title: Dual Perspectives on Non-Contrastive Self-Supervised Learning
Authors: Jean Ponce, Martial Hebert, Basile Terver,
Abstract summary: Stop gradient and exponential moving average iterative procedures are commonly used to avoid representation collapse.<n>This presentation investigates these procedures from the dual theoretical viewpoints of optimization and dynamical systems.
Score: 40.79287810164605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The objective of non-contrastive approaches to self-supervised learning is to train on pairs of different views of the data an encoder and a predictor that minimize the mean discrepancy between the code predicted from the embedding of the first view and the embedding of the second one. In this setting, the stop gradient and exponential moving average iterative procedures are commonly used to avoid representation collapse, with excellent performance in downstream supervised applications. This presentation investigates these procedures from the dual theoretical viewpoints of optimization and dynamical systems. We first show that, in general, although they do not optimize the original objective, or for that matter, any other smooth function, they do avoid collapse. Following Tian et al. [2021], but without any of the extra assumptions used in their proofs, we then show using a dynamical system perspective that, in the linear case, minimizing the original objective function without the use of a stop gradient or exponential moving average always leads to collapse. Conversely, we finally show that the limit points of the dynamical systems associated with these two procedures are, in general, asymptotically stable equilibria, with no risk of degenerating to trivial solutions.

Related papers

On exploration of an interior mirror descent flow for stochastic nonconvex constrained problem [3.4376560669160394]
We show that Hessian barrier method and mirror descent scheme can be interpreted as discrete approximations of the continuous flow.<n>We provide two sufficient conditions such that these spurious stationary points can be avoided if the strict complementarity conditions holds.
arXiv Detail & Related papers (2025-07-21T05:58:52Z)
Contrastive Matrix Completion with Denoising and Augmented Graph Views for Robust Recommendation [1.0128808054306186]
Matrix completion is a widely adopted framework in recommender systems.<n>We propose a novel method called Matrix Completion using Contrastive Learning (MCCL)<n>Our approach not only improves the numerical accuracy of the predicted scores--but also produces superior rankings with improvements of up to 36% in ranking metrics.
arXiv Detail & Related papers (2025-06-12T12:47:35Z)
Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models.<n>A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes.<n>We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL) This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks [34.79650838578354]
We introduce a new framework that utilizes two different kinds of oracles. We show that the exponential mechanism can achieve a good population risk bound, and provide a nearly matching lower bound.
arXiv Detail & Related papers (2023-02-20T00:11:19Z)
Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning. GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients. This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z)
Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations [8.163697683448811]
We introduce EfficientMORL, an efficient framework for the unsupervised learning of object-centric representations. We show that optimization challenges caused by requiring both symmetry and disentanglement can be addressed by high-cost iterative amortized inference. We demonstrate strong object decomposition and disentanglement on the standard multi-object benchmark while achieving nearly an order of magnitude faster training and test time inference.
arXiv Detail & Related papers (2021-06-07T14:02:49Z)
On dissipative symplectic integration with applications to gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically. We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.