Related papers: Stochastic Parameter Decomposition

Stochastic Parameter Decomposition

URL: http://arxiv.org/abs/2506.20790v1
Date: Wed, 25 Jun 2025 19:26:31 GMT
Title: Stochastic Parameter Decomposition
Authors: Lucius Bushnaq, Dan Braun, Lee Sharkey,
Abstract summary: A key step in reverse engineering neural networks is to decompose them into simpler parts that can be studied in relative isolation.<n>The current main method in this framework, Attribution-based.<n>Decomposition (APD), is impractical on account of its computational cost.<n>We introduce textitStochastic.<n>Decomposition (SPD), a method that is more scalable and robust.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A key step in reverse engineering neural networks is to decompose them into simpler parts that can be studied in relative isolation. Linear parameter decomposition -- a framework that has been proposed to resolve several issues with current decomposition methods -- decomposes neural network parameters into a sum of sparsely used vectors in parameter space. However, the current main method in this framework, Attribution-based Parameter Decomposition (APD), is impractical on account of its computational cost and sensitivity to hyperparameters. In this work, we introduce \textit{Stochastic Parameter Decomposition} (SPD), a method that is more scalable and robust to hyperparameters than APD, which we demonstrate by decomposing models that are slightly larger and more complex than was possible to decompose with APD. We also show that SPD avoids other issues, such as shrinkage of the learned parameters, and better identifies ground truth mechanisms in toy models. By bridging causal mediation analysis and network decomposition methods, this demonstration opens up new research possibilities in mechanistic interpretability by removing barriers to scaling linear parameter decomposition methods to larger models. We release a library for running SPD and reproducing our experiments at https://github.com/goodfire-ai/spd.

Related papers

Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence.<n>To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus.<n>We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition [0.0]
We introduce a conceptual foundation for Attribution-based Decomposition (APD)<n>APD directly decomposes a neural network's parameters into components that are faithful to the parameters of the original network.<n>We demonstrate APD's effectiveness by successfully identifying ground truth mechanisms in toy experimental settings.
arXiv Detail & Related papers (2025-01-24T21:31:12Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Activated Parameter Locating via Causal Intervention for Model Merging [26.98015572633289]
Model merging combines multiple models into one model, achieving convincing generalization without the necessity of additional training. Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance. We propose an Activated Locating (APL) method that utilizes causal intervention to estimate importance, enabling more precise parameter drops and better conflict mitigation.
arXiv Detail & Related papers (2024-08-18T14:00:00Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions [2.76240219662896]
We introduce a new general representation framework that leverages recent results on em decompositions of multi parameter persistent homology. We establish theoretical stability guarantees under this framework as well as efficient algorithms for practical computation. We validate our stability results and algorithms with numerical experiments that demonstrate statistical convergence, prediction accuracy, and fast running times on several real data sets.
arXiv Detail & Related papers (2023-06-19T21:28:53Z)
Multilevel CNNs for Parametric PDEs [0.0]
We combine concepts from multilevel solvers for partial differential equations with neural network based deep learning. An in-depth theoretical analysis shows that the proposed architecture is able to approximate multigrid V-cycles to arbitrary precision. We find substantial improvements over state-of-the-art deep learning-based solvers.
arXiv Detail & Related papers (2023-04-01T21:11:05Z)
Numerical Optimizations for Weighted Low-rank Estimation on Language Model [73.12941276331316]
Singular value decomposition (SVD) is one of the most popular compression methods that approximates a target matrix with smaller matrices. Standard SVD treats the parameters within the matrix with equal importance, which is a simple but unrealistic assumption. We show that our method can perform better than current SOTA methods in neural-based language models.
arXiv Detail & Related papers (2022-11-02T00:58:02Z)
Inverting brain grey matter models with likelihood-free inference: a tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI. We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells. We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z)
Parsimony-Enhanced Sparse Bayesian Learning for Robust Discovery of Partial Differential Equations [5.584060970507507]
A Parsimony Enhanced Sparse Bayesian Learning (PeSBL) method is developed for discovering the governing Partial Differential Equations (PDEs) of nonlinear dynamical systems. Results of numerical case studies indicate that the governing PDEs of many canonical dynamical systems can be correctly identified using the proposed PeSBL method.
arXiv Detail & Related papers (2021-07-08T00:56:11Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.