Related papers: MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

URL: http://arxiv.org/abs/2111.12664v1
Date: Wed, 24 Nov 2021 17:51:29 GMT
Title: MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
Authors: Siladittya Manna, Saumik Bhattacharya and Umapada Pal
Abstract summary: We model contrastive learning into a binary classification problem to predict if a pair is positive or not. The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100.
Score: 19.5917119072985
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Self-supervised contrastive learning is one of the domains which has progressed rapidly over the last few years. Most of the state-of-the-art self-supervised algorithms use a large number of negative samples, momentum updates, specific architectural modifications, or extensive training to learn good representations. Such arrangements make the overall training process complex and challenging to realize analytically. In this paper, we propose a mutual information optimization based loss function for contrastive learning where we model contrastive learning into a binary classification problem to predict if a pair is positive or not. This formulation not only helps us to track the problem mathematically but also helps us to outperform existing algorithms. Unlike the existing methods that only maximize the mutual information in a positive pair, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a mathematical expression for the parameter gradients flowing into the projector and the displacement of the feature vectors in the feature space. This helps us to get a mathematical insight into the working principle of contrastive learning. An additive $L_2$ regularizer is also used to prevent diverging of the feature vectors and to improve performance. The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100. After only 250 epochs of pre-training, the proposed model achieves the best accuracy of 85.44\%, 60.75\%, 56.81\% on CIFAR-10, STL-10, CIFAR-100 datasets, respectively.

Related papers

Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Bayesian Learning-driven Prototypical Contrastive Loss for Class-Incremental Learning [42.14439854721613]
This paper proposes a method to learn an effective representation between previous and newly encountered class prototypes. We introduce a contrastive loss that incorporates novel classes into the latent representation by reducing intra-class and increasing inter-class distance.
arXiv Detail & Related papers (2024-05-17T19:49:02Z)
When hard negative sampling meets supervised contrastive learning [17.173114048398947]
We introduce a new supervised contrastive learning objective, SCHaNe, which incorporates hard negative sampling during the fine-tuning phase. SchaNe outperforms the strong baseline BEiT-3 in Top-1 accuracy across various benchmarks. Our proposed objective sets a new state-of-the-art for base models on ImageNet-1k, achieving an 86.14% accuracy.
arXiv Detail & Related papers (2023-08-28T20:30:10Z)
Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning [79.43940012723539]
ADCLR is a self-supervised learning framework for learning accurate and dense vision representation. Our approach achieves new state-of-the-art performance for contrastive methods.
arXiv Detail & Related papers (2023-06-23T07:38:09Z)
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems [0.755972004983746]
We propose a feature extraction method that describes the trajectories of optimization algorithms using simple statistics. We demonstrate that the proposed DynamoRep features capture enough information to identify the problem class on which the optimization algorithm is running.
arXiv Detail & Related papers (2023-06-08T06:57:07Z)
Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs. We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z)
Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z)
Model Predictive Control with Self-supervised Representation Learning [13.225264876433528]
We propose the use of a reconstruction function within the TD-MPC framework, so that the agent can reconstruct the original observation. Our proposed addition of another loss term leads to improved performance on both state- and image-based tasks.
arXiv Detail & Related papers (2023-04-14T16:02:04Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points. Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches. We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy. Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
(Certified!!) Adversarial Robustness for Free! [116.6052628829344]
We certify 71% accuracy on ImageNet under adversarial perturbations constrained to be within a 2-norm of 0.5. We obtain these results using only pretrained diffusion models and image classifiers, without requiring any fine tuning or retraining of model parameters.
arXiv Detail & Related papers (2022-06-21T17:27:27Z)
Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result. We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z)
To be Critical: Self-Calibrated Weakly Supervised Learning for Salient Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations. We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions. We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z)
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations [87.72779294717267]
Using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification. We demonstrate empirically that our method is less reliant on complex data augmentations.
arXiv Detail & Related papers (2021-04-29T17:56:08Z)
Semantically-Conditioned Negative Samples for Efficient Contrastive Learning [22.631763991832862]
Negative sampling provides little information about the class boundaries. We propose three novel techniques for efficient negative sampling. Our experiments on CIFAR-10, CIFAR-100 and Tiny-ImageNet-200 show consistent performance improvements.
arXiv Detail & Related papers (2021-02-12T16:26:52Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
Neural Non-Rigid Tracking [26.41847163649205]
We introduce a novel, end-to-end learnable, differentiable non-rigid tracker. We employ a convolutional neural network to predict dense correspondences and their confidences. Compared to state-of-the-art approaches, our algorithm shows improved reconstruction performance.
arXiv Detail & Related papers (2020-06-23T18:00:39Z)
Supervised Contrastive Learning [42.27949000093086]
We extend the self-supervised batch contrastive approach to the fully-supervised setting. We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss. On ResNet-200, we achieve top-1 accuracy of 81.4% on the ImageNet dataset, which is 0.8% above the best number reported for this architecture.
arXiv Detail & Related papers (2020-04-23T17:58:56Z)
Training Binary Neural Networks with Real-to-Binary Convolutions [52.91164959767517]
We show how to train binary networks to within a few percent points of the full precision counterpart. We show how to build a strong baseline, which already achieves state-of-the-art accuracy. We show that, when putting all of our improvements together, the proposed model beats the current state of the art by more than 5% top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2020-03-25T17:54:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.