Kronecker-factored Approximate Curvature (KFAC) From Scratch
- URL: http://arxiv.org/abs/2507.05127v1
- Date: Tue, 01 Jul 2025 15:21:08 GMT
- Title: Kronecker-factored Approximate Curvature (KFAC) From Scratch
- Authors: Felix Dangel, Bálint Mucsányi, Tobias Weber, Runa Eschenhagen,
- Abstract summary: This tutorial is meant as a ground-up introduction to Kronecker-factored approximate curvature (KFAC)<n>It provides both math and code side-by-side and provides test cases based on the latest insights into KFAC.
- Score: 4.389123177083446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kronecker-factored approximate curvature (KFAC) is arguably one of the most prominent curvature approximations in deep learning. Its applications range from optimization to Bayesian deep learning, training data attribution with influence functions, and model compression or merging. While the intuition behind KFAC is easy to understand, its implementation is tedious: It comes in many flavours, has common pitfalls when translating the math to code, and is challenging to test, which complicates ensuring a properly functioning implementation. Some of the authors themselves have dealt with these challenges and experienced the discomfort of not being able to fully test their code. Thanks to recent advances in understanding KFAC, we are now able to provide test cases and a recipe for a reliable KFAC implementation. This tutorial is meant as a ground-up introduction to KFAC. In contrast to the existing work, our focus lies on providing both math and code side-by-side and providing test cases based on the latest insights into KFAC that are scattered throughout the literature. We hope this tutorial provides a contemporary view of KFAC that allows beginners to gain a deeper understanding of this curvature approximation while lowering the barrier to its implementation, extension, and usage in practice.
Related papers
- Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z) - Accurate Forgetting for Heterogeneous Federated Continual Learning [89.08735771893608]
We propose a new concept accurate forgetting (AF) and develop a novel generative-replay methodMethodwhich selectively utilizes previous knowledge in federated networks.<n>We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge.
arXiv Detail & Related papers (2025-02-20T02:35:17Z) - LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP [20.86307407685542]
Linear Probe (LP) has been often reported as a weak baseline for few-shot CLIP adaptation.
In this work, we examine from convex-optimization perspectives a generalization of the standard LP baseline.
Our image-language objective function, along with these non-trivial optimization insights and ingredients, yields, surprisingly, highly competitive few-shot CLIP performances.
arXiv Detail & Related papers (2024-04-02T20:23:10Z) - Normalizing Flow-based Neural Process for Few-Shot Knowledge Graph
Completion [69.55700751102376]
Few-shot knowledge graph completion (FKGC) aims to predict missing facts for unseen relations with few-shot associated facts.
Existing FKGC methods are based on metric learning or meta-learning, which often suffer from the out-of-distribution and overfitting problems.
In this paper, we propose a normalizing flow-based neural process for few-shot knowledge graph completion (NP-FKGC)
arXiv Detail & Related papers (2023-04-17T11:42:28Z) - QLABGrad: a Hyperparameter-Free and Convergence-Guaranteed Scheme for
Deep Learning [6.555832619920502]
We propose a novel learning rate adaptation scheme called QLABGrad.
QLABGrad automatically determines the learning rate by optimizing the Quadratic Loss Approximation-Based (QLAB) function for a given gradient descent direction.
arXiv Detail & Related papers (2023-02-01T05:29:10Z) - Brand New K-FACs: Speeding up K-FAC with Online Decomposition Updates [0.0]
We exploit the exponential-average construction paradigm of the K-factors, and use online numerical linear algebra techniques.
We propose a K-factor inverse update which scales linearly in layer size.
We also propose an inverse application procedure which scales linearly as well.
arXiv Detail & Related papers (2022-10-16T09:41:23Z) - FARE: Provably Fair Representation Learning with Practical Certificates [9.242965489146398]
We introduce FARE, the first FRL method with practical fairness certificates.
FARE is based on our key insight that restricting the representation space of the encoder enables the derivation of practical guarantees.
We show that FARE produces practical certificates that are tight and often even comparable with purely empirical results.
arXiv Detail & Related papers (2022-10-13T17:40:07Z) - Confident Sinkhorn Allocation for Pseudo-Labeling [40.883130133661304]
Semi-supervised learning is a critical tool in reducing machine learning's dependence on labeled data.
This paper studies theoretically the role of uncertainty to pseudo-labeling and proposes Confident Sinkhorn Allocation (CSA)
CSA identifies the best pseudo-label allocation via optimal transport to only samples with high confidence scores.
arXiv Detail & Related papers (2022-06-13T02:16:26Z) - Gradient Descent on Neurons and its Link to Approximate Second-Order
Optimization [0.913755431537592]
We show that Kronecker-Factored, block-diagonal curvature estimates (KFAC) significantly outperforms true second-order updates.
We also show that KFAC approximates a first-order gradient algorithm, which performs a gradient descent on rather than weights.
arXiv Detail & Related papers (2022-01-28T17:06:26Z) - Revisiting Deep Local Descriptor for Improved Few-Shot Classification [56.74552164206737]
We show how one can improve the quality of embeddings by leveraging textbfDense textbfClassification and textbfAttentive textbfPooling.
We suggest to pool feature maps by applying attentive pooling instead of the widely used global average pooling (GAP) to prepare embeddings for few-shot classification.
arXiv Detail & Related papers (2021-03-30T00:48:28Z) - An iterative K-FAC algorithm for Deep Learning [0.0]
Key of K-FAC is to approximates Fisher information matrix (FIM) as a block-diagonal matrix where each block is an inverse of tiny Kronecker factors.
In this short note, we present CG-FAC -- an new iterative K-FAC algorithm.
We prove that the time and memory complexity of iterative CG-FAC is much less than that of standard K-FAC algorithm.
arXiv Detail & Related papers (2021-01-01T12:04:01Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.