Related papers: Implicit Interpretation of Importance Weight Aware Updates

Implicit Interpretation of Importance Weight Aware Updates

URL: http://arxiv.org/abs/2307.11955v1
Date: Sat, 22 Jul 2023 01:37:52 GMT
Title: Implicit Interpretation of Importance Weight Aware Updates
Authors: Keyi Chen and Francesco Orabona
Abstract summary: Subgradient descent is one of the most used optimization algorithms in convex machine learning algorithms. We show for the first time that IWA updates have a strictly better regret upper bound than plain gradient updates.
Score: 15.974402990630402
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Due to its speed and simplicity, subgradient descent is one of the most used optimization algorithms in convex machine learning algorithms. However, tuning its learning rate is probably its most severe bottleneck to achieve consistent good performance. A common way to reduce the dependency on the learning rate is to use implicit/proximal updates. One such variant is the Importance Weight Aware (IWA) updates, which consist of infinitely many infinitesimal updates on each loss function. However, IWA updates' empirical success is not completely explained by their theory. In this paper, we show for the first time that IWA updates have a strictly better regret upper bound than plain gradient updates in the online learning setting. Our analysis is based on the new framework, generalized implicit Follow-the-Regularized-Leader (FTRL) (Chen and Orabona, 2023), to analyze generalized implicit updates using a dual formulation. In particular, our results imply that IWA updates can be considered as approximate implicit/proximal updates.

Related papers

Can Gradient Descent Simulate Prompting? [56.60154660021178]
gradient updates the effects of conditioning on new information.<n> gradient descent training recovers some (and occasionally all) of prompted model performance.<n>Results suggest new avenues for long-context modeling.
arXiv Detail & Related papers (2025-06-26T04:06:20Z)
Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications. However, generalization properties of second-order methods are still being debated. We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z)
Knowledge Editing in Language Models via Adapted Direct Preference Optimization [50.616875565173274]
Large Language Models (LLMs) can become outdated over time. Knowledge Editing aims to overcome this challenge using weight updates that do not require expensive retraining.
arXiv Detail & Related papers (2024-06-14T11:02:21Z)
CLASSP: a Biologically-Inspired Approach to Continual Learning through Adjustment Suppression and Sparsity Promotion [0.0]
This paper introduces a new training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP) CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation. When compared with Elastic Weight Consolidation (EWC) datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.
arXiv Detail & Related papers (2024-04-29T13:31:00Z)
Multiplicative update rules for accelerating deep learning training and increasing robustness [69.90473612073767]
We propose an optimization framework that fits to a wide range of machine learning algorithms and enables one to apply alternative update rules. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule.
arXiv Detail & Related papers (2023-07-14T06:44:43Z)
Towards Constituting Mathematical Structures for Learning to Optimize [101.80359461134087]
A technique that utilizes machine learning to learn an optimization algorithm automatically from data has gained arising attention in recent years. A generic L2O approach parameterizes the iterative update rule and learns the update direction as a black-box network. While the generic approach is widely applicable, the learned model can overfit and may not generalize well to out-of-distribution test sets. We propose a novel L2O model with a mathematics-inspired structure that is broadly applicable and generalized well to out-of-distribution problems.
arXiv Detail & Related papers (2023-05-29T19:37:28Z)
Plug-and-Play Adaptation for Continuously-updated QA [21.665681980293137]
Language models (LMs) have shown great potential as implicit knowledge bases (KBs) For their practical use, knowledge in LMs need to be updated periodically. We propose a novel task--Continuously-updated QA--in which multiple large-scale updates are made to LMs.
arXiv Detail & Related papers (2022-04-27T09:11:16Z)
New Insights on Reducing Abrupt Representation Change in Online Continual Learning [69.05515249097208]
We focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream. We show that applying Experience Replay causes the newly added classes' representations to overlap significantly with the previous classes. We propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes.
arXiv Detail & Related papers (2022-03-08T01:37:00Z)
Correcting Momentum in Temporal Difference Learning [95.62766731469671]
We argue that momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale. We show that this phenomenon exists, and then propose a first-order correction term to momentum. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.
arXiv Detail & Related papers (2021-06-07T20:41:15Z)
t-Soft Update of Target Network for Deep Reinforcement Learning [8.071506311915396]
This paper proposes a new robust update rule of target network for deep reinforcement learning (DRL) A t-soft update method is derived with reference to the analogy between the exponential moving average and the normal distribution. In PyBullet robotics simulations for DRL, an online actor-critic algorithm with the t-soft update outperformed the conventional methods in terms of the obtained return and/or its variance.
arXiv Detail & Related papers (2020-08-25T07:41:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.