Related papers: Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations

Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations

URL: http://arxiv.org/abs/2509.21511v1
Date: Thu, 25 Sep 2025 20:03:24 GMT
Title: Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations
Authors: Micha Livne,
Abstract summary: We introduce the contrastive Mutual Information Machine (cMIM), a probabilistic framework that extends the Mutual Information Machine (MIM) with a contrastive objective.<n>CMIM addresses this gap by imposing global discriminative structure while retaining MIM's generative fidelity.<n>We provide empirical evidence across vision and molecular benchmarks showing that cMIM outperforms MIM and InfoNCE on classification and regression tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning representations that transfer well to diverse downstream tasks remains a central challenge in representation learning. Existing paradigms -- contrastive learning, self-supervised masking, and denoising auto-encoders -- balance this challenge with different trade-offs. We introduce the {contrastive Mutual Information Machine} (cMIM), a probabilistic framework that extends the Mutual Information Machine (MIM) with a contrastive objective. While MIM maximizes mutual information between inputs and latents and promotes clustering of codes, it falls short on discriminative tasks. cMIM addresses this gap by imposing global discriminative structure while retaining MIM's generative fidelity. Our contributions are threefold. First, we propose cMIM, a contrastive extension of MIM that removes the need for positive data augmentation and is substantially less sensitive to batch size than InfoNCE. Second, we introduce {informative embeddings}, a general technique for extracting enriched features from encoder-decoder models that boosts discriminative performance without additional training and applies broadly beyond MIM. Third, we provide empirical evidence across vision and molecular benchmarks showing that cMIM outperforms MIM and InfoNCE on classification and regression tasks while preserving competitive reconstruction quality. These results position cMIM as a unified framework for representation learning, advancing the goal of models that serve both discriminative and generative applications effectively.

Related papers

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning [67.72413262980272]
Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear.<n>We develop SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck.<n>Our approach achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations.
arXiv Detail & Related papers (2025-03-10T06:18:31Z)
Learning Mask Invariant Mutual Information for Masked Image Modeling [35.63719638508299]
Maskedencodes (MAEs) represent a prominent self-supervised learning paradigm in computer vision.<n>Recent studies have attempted to elucidate the functioning of MAEs through contrastive learning and feature representation analysis.<n>We propose a new perspective for understanding MAEs by leveraging the information bottleneck principle in information theory.
arXiv Detail & Related papers (2025-02-27T03:19:05Z)
Contrastive MIM: A Contrastive Mutual Information Framework for Unified Generative and Discriminative Representation Learning [0.0]
We introduce the contrastive Mutual Information Machine (cMIM), a probabilistic framework that augments the Mutual Information Machine (MIM) with a novel contrastive objective.<n> cMIM addresses this limitation by enforcing global discriminative structure while retaining MIM's generative strengths.<n>We present two main contributions: (1) we propose cMIM, a contrastive extension of MIM that eliminates the need for positive data augmentation and is robust to batch size, unlike InfoNCE-based methods; (2) we introduce informative embeddings, a general technique for extracting enriched representations from encoder--decoder
arXiv Detail & Related papers (2025-02-27T00:23:40Z)
BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery [79.52947133303498]
We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models.<n>On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days.<n>The BioNeMo Framework is open-source and free for everyone to use.
arXiv Detail & Related papers (2024-11-15T19:46:16Z)
MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning [18.152453141040464]
Unsupervised Few-Shot Learning seeks to bridge this divide by reducing reliance on annotated datasets during initial training phases. We first quantitatively assess the impacts of Masked Image Modeling (MIM) and Contrastive Learning (CL) on few-shot learning tasks. To address these trade-offs between generalization and discriminability in unsupervised pretraining, we introduce a novel paradigm named Masked Image Contrastive Modeling (MICM)
arXiv Detail & Related papers (2024-08-23T21:32:53Z)
Constructing Enhanced Mutual Information for Online Class-Incremental Learning [11.555090963348595]
Online Class-Incremental continual Learning (OCIL) addresses the challenge of continuously learning from a single-channel data stream. Existing Mutual Information (MI)-based methods treat various knowledge components in isolation, ignoring the knowledge confusion across tasks. We propose an Enhanced Mutual Information (EMI) method based on knwoledge decoupling.
arXiv Detail & Related papers (2024-07-26T06:16:11Z)
On the Role of Discrete Tokenization in Visual Representation Learning [35.10829554701771]
Masked image modeling (MIM) has gained popularity alongside contrastive learning methods. discrete tokens as the reconstruction target, but the theoretical underpinnings of this choice remain underexplored. We provide a comprehensive theoretical understanding on how discrete tokenization affects the model's generalization capabilities. We propose a novel metric named TCAS, which is specifically designed to assess the effectiveness of discrete tokens within the MIM framework.
arXiv Detail & Related papers (2024-07-12T08:25:31Z)
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL) MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z)
Mixed Autoencoder for Self-supervised Visual Representation Learning [95.98114940999653]
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction. This paper studies the prevailing mixing augmentation for MAE.
arXiv Detail & Related papers (2023-03-30T05:19:43Z)
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations. We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z)
Multi-Modal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH) We learn informative representations that can preserve both intra- and inter-modal similarities. The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z)
What Makes for Good Views for Contrastive Learning? [90.49736973404046]
We argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact. We devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI. As a by-product, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification.
arXiv Detail & Related papers (2020-05-20T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.