Related papers: Self-supervised Representation Learning with Relative Predictive Coding

Self-supervised Representation Learning with Relative Predictive Coding

URL: http://arxiv.org/abs/2103.11275v1
Date: Sun, 21 Mar 2021 01:04:24 GMT
Title: Self-supervised Representation Learning with Relative Predictive Coding
Authors: Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Han Zhao, Louis-Philippe Morency, Ruslan Salakhutdinov
Abstract summary: Relative Predictive Coding (RPC) is a new contrastive representation learning objective. RPC maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. We empirically verify the effectiveness of RPC on benchmark vision and speech self-supervised learning tasks.
Score: 102.93854542031396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. The key to the success of RPC is two-fold. First, RPC introduces the relative parameters to regularize the objective for boundedness and low variance. Second, RPC contains no logarithm and exponential score functions, which are the main cause of training instability in prior contrastive objectives. We empirically verify the effectiveness of RPC on benchmark vision and speech self-supervised learning tasks. Lastly, we relate RPC with mutual information (MI) estimation, showing RPC can be used to estimate MI with low variance.

Related papers

A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance [3.4354636842203026]
We propose PGDA-RL, a primal-dual Projected Gradient Descent-Ascent algorithm for solving regularized Markov Decision Processes (MDPs)<n>PGDA-RL integrates experience replay-based gradient estimation with a two-timescale decomposition of the underlying nested optimization problem.<n>We prove that PGDA-RL converges almost surely to the optimal value function and policy of the regularized MDP.
arXiv Detail & Related papers (2025-05-07T15:18:43Z)
Stability-based Generalization Analysis of Randomized Coordinate Descent for Pairwise Learning [30.557503207329965]
We consider the generalization of RCD for pairwise learning. We measure the on-average argument stability for both convex and strongly convex objective functions.
arXiv Detail & Related papers (2025-03-03T13:39:06Z)
ResFlow: Fine-tuning Residual Optical Flow for Event-based High Temporal Resolution Motion Estimation [50.80115710105251]
Event cameras hold significant promise for high-temporal-resolution (HTR) motion estimation. We propose a residual-based paradigm for estimating HTR optical flow with event data.
arXiv Detail & Related papers (2024-12-12T09:35:47Z)
Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate [4.6659670917171825]
Recurrent reinforcement learning (RL) consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction. Previous RL methods face training stability issues due to the gradient instability of RNNs. We propose Recurrent Off-policy RL with Context-Encoder-Specific Learning Rate (RESeL) to tackle this issue.
arXiv Detail & Related papers (2024-05-24T09:33:47Z)
Prior Constraints-based Reward Model Training for Aligning Large Language Models [58.33118716810208]
This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem. PCRM incorporates prior constraints, specifically, length ratio and cosine similarity between outputs of each comparison pair, during reward model training to regulate optimization magnitude and control score margins. Experimental results demonstrate that PCRM significantly improves alignment performance by effectively constraining reward score scaling.
arXiv Detail & Related papers (2024-04-01T07:49:11Z)
Directly Attention Loss Adjusted Prioritized Experience Replay [0.07366405857677226]
Prioritized Replay Experience (PER) enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. DALAP is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network.
arXiv Detail & Related papers (2023-11-24T10:14:05Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
Continual Contrastive Finetuning Improves Low-Resource Relation Extraction [34.76128090845668]
Relation extraction has been particularly challenging in low-resource scenarios and domains. Recent literature has tackled low-resource RE by self-supervised learning. We propose to pretrain and finetune the RE model using consistent objectives of contrastive learning.
arXiv Detail & Related papers (2022-12-21T07:30:22Z)
Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks. Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z)
RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents [40.51184157538392]
We propose a novel MARL method with the Conditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. We show that our method significantly outperforms state-of-the-art methods challenging StarCraft II tasks.
arXiv Detail & Related papers (2021-02-16T13:58:25Z)
Strategy for Boosting Pair Comparison and Improving Quality Assessment Accuracy [29.849156371902943]
Pair Comparison (PC) is of significant advantage over Absolute Category Rating (ACR) in terms of discriminability. In this study, we employ a generic model to bridge the pair comparison data and ACR data, where the variance term could be recovered and the obtained information is more complete. In such a way, the proposed methodology could achieve the same accuracy of pair comparison but with the compelxity as low as ACR.
arXiv Detail & Related papers (2020-10-01T13:05:09Z)
Robust Learning Through Cross-Task Consistency [92.42534246652062]
We propose a broadly applicable and fully computational method for augmenting learning with Cross-Task Consistency. We observe that learning with cross-task consistency leads to more accurate predictions and better generalization to out-of-distribution inputs.
arXiv Detail & Related papers (2020-06-07T09:24:33Z)
An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective. Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.