Self-supervised Representation Learning with Relative Predictive Coding
- URL: http://arxiv.org/abs/2103.11275v1
- Date: Sun, 21 Mar 2021 01:04:24 GMT
- Title: Self-supervised Representation Learning with Relative Predictive Coding
- Authors: Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Han Zhao,
Louis-Philippe Morency, Ruslan Salakhutdinov
- Abstract summary: Relative Predictive Coding (RPC) is a new contrastive representation learning objective.
RPC maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance.
We empirically verify the effectiveness of RPC on benchmark vision and speech self-supervised learning tasks.
- Score: 102.93854542031396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces Relative Predictive Coding (RPC), a new contrastive
representation learning objective that maintains a good balance among training
stability, minibatch size sensitivity, and downstream task performance. The key
to the success of RPC is two-fold. First, RPC introduces the relative
parameters to regularize the objective for boundedness and low variance.
Second, RPC contains no logarithm and exponential score functions, which are
the main cause of training instability in prior contrastive objectives. We
empirically verify the effectiveness of RPC on benchmark vision and speech
self-supervised learning tasks. Lastly, we relate RPC with mutual information
(MI) estimation, showing RPC can be used to estimate MI with low variance.
Related papers
- Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate [4.6659670917171825]
Recurrent reinforcement learning (RL) consists of a context encoder based on recurrent neural networks (RNNs) for unobservable state prediction.
Previous RL methods face training stability issues due to the gradient instability of RNNs.
We propose Recurrent Off-policy RL with Context-Encoder-Specific Learning Rate (RESeL) to tackle this issue.
arXiv Detail & Related papers (2024-05-24T09:33:47Z) - Prior Constraints-based Reward Model Training for Aligning Large Language Models [58.33118716810208]
This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem.
PCRM incorporates prior constraints, specifically, length ratio and cosine similarity between outputs of each comparison pair, during reward model training to regulate optimization magnitude and control score margins.
Experimental results demonstrate that PCRM significantly improves alignment performance by effectively constraining reward score scaling.
arXiv Detail & Related papers (2024-04-01T07:49:11Z) - Directly Attention Loss Adjusted Prioritized Experience Replay [0.07366405857677226]
Prioritized Replay Experience (PER) enables the model to learn more about relatively important samples by artificially changing their accessed frequencies.
DALAP is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network.
arXiv Detail & Related papers (2023-11-24T10:14:05Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Continual Contrastive Finetuning Improves Low-Resource Relation
Extraction [34.76128090845668]
Relation extraction has been particularly challenging in low-resource scenarios and domains.
Recent literature has tackled low-resource RE by self-supervised learning.
We propose to pretrain and finetune the RE model using consistent objectives of contrastive learning.
arXiv Detail & Related papers (2022-12-21T07:30:22Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement
Learning Agents [40.51184157538392]
We propose a novel MARL method with the Conditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values.
We show that our method significantly outperforms state-of-the-art methods challenging StarCraft II tasks.
arXiv Detail & Related papers (2021-02-16T13:58:25Z) - Strategy for Boosting Pair Comparison and Improving Quality Assessment
Accuracy [29.849156371902943]
Pair Comparison (PC) is of significant advantage over Absolute Category Rating (ACR) in terms of discriminability.
In this study, we employ a generic model to bridge the pair comparison data and ACR data, where the variance term could be recovered and the obtained information is more complete.
In such a way, the proposed methodology could achieve the same accuracy of pair comparison but with the compelxity as low as ACR.
arXiv Detail & Related papers (2020-10-01T13:05:09Z) - Robust Learning Through Cross-Task Consistency [92.42534246652062]
We propose a broadly applicable and fully computational method for augmenting learning with Cross-Task Consistency.
We observe that learning with cross-task consistency leads to more accurate predictions and better generalization to out-of-distribution inputs.
arXiv Detail & Related papers (2020-06-07T09:24:33Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.