Related papers: Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias

Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias

URL: http://arxiv.org/abs/2506.06280v1
Date: Fri, 06 Jun 2025 17:59:28 GMT
Title: Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Authors: Yuanzhe Hu, Kinshuk Goel, Vlad Killiakov, Yaoqing Yang,
Abstract summary: Diagnosing deep neural networks (DNNs) through the eigenspectrum of weight matrices has been an active area of research in recent years.<n>We address the impact of the aspect ratio of weight matrices on estimated heavytailness metrics.<n>We propose FARMS, a method that normalizes the weight matrices by subsampling submatrices with a fixed aspect ratio.
Score: 4.503999875371634
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diagnosing deep neural networks (DNNs) through the eigenspectrum of weight matrices has been an active area of research in recent years. At a high level, eigenspectrum analysis of DNNs involves measuring the heavytailness of the empirical spectral densities (ESD) of weight matrices. It provides insight into how well a model is trained and can guide decisions on assigning better layer-wise training hyperparameters. In this paper, we address a challenge associated with such eigenspectrum methods: the impact of the aspect ratio of weight matrices on estimated heavytailness metrics. We demonstrate that matrices of varying sizes (and aspect ratios) introduce a non-negligible bias in estimating heavytailness metrics, leading to inaccurate model diagnosis and layer-wise hyperparameter assignment. To overcome this challenge, we propose FARMS (Fixed-Aspect-Ratio Matrix Subsampling), a method that normalizes the weight matrices by subsampling submatrices with a fixed aspect ratio. Instead of measuring the heavytailness of the original ESD, we measure the average ESD of these subsampled submatrices. We show that measuring the heavytailness of these submatrices with the fixed aspect ratio can effectively mitigate the aspect ratio bias. We validate our approach across various optimization techniques and application domains that involve eigenspectrum analysis of weights, including image classification in computer vision (CV) models, scientific machine learning (SciML) model training, and large language model (LLM) pruning. Our results show that despite its simplicity, FARMS uniformly improves the accuracy of eigenspectrum analysis while enabling more effective layer-wise hyperparameter assignment in these application domains. In one of the LLM pruning experiments, FARMS reduces the perplexity of the LLaMA-7B model by 17.3% when compared with the state-of-the-art method.

Related papers

Sparsity and Total Variation Constrained Multilayer Linear Unmixing for Hyperspectral Imagery [2.8516007089651043]
Hyperspectral unmixing aims at estimating material signatures (known as endmembers) and the corresponding proportions (referred to abundances)<n>This study develops a novel approach called sparsity and total variation (TV) constrained multilayer linear unmixing (STVMLU) for hyperspectral imagery.
arXiv Detail & Related papers (2025-08-05T12:50:55Z)
Machine Learning Framework for Characterizing Processing-Structure Relationship in Block Copolymer Thin Films [1.4698426549994696]
The morphology of block copolymers (BCPs) critically influences material properties and applications.<n>This work introduces a machine learning (ML)-enabled framework for analyzing grazing incidence small-angle X-ray scattering (GISAXS) data and atomic force microscopy (AFM) images to characterize BCP thin film morphology.
arXiv Detail & Related papers (2025-05-29T04:14:42Z)
RoSTE: An Efficient Quantization-Aware Supervised Fine-Tuning Approach for Large Language Models [53.571195477043496]
We propose an algorithm named Rotated Straight-Through-Estimator (RoSTE)<n>RoSTE combines quantization-aware supervised fine-tuning (QA-SFT) with an adaptive rotation strategy to reduce activation outliers.<n>Our findings reveal that the prediction error is directly proportional to the quantization error of the converged weights, which can be effectively managed through an optimized rotation configuration.
arXiv Detail & Related papers (2025-02-13T06:44:33Z)
Loss Landscape Analysis for Reliable Quantized ML Models for Scientific Sensing [41.89148096989836]
We propose a method to perform empirical analysis of the loss landscape of machine learning (ML) models.<n>Our method allows assessing the robustness of ML models to such effects as a function of quantization precision and under different regularization techniques.
arXiv Detail & Related papers (2025-02-12T12:30:49Z)
Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence [33.12963161545068]
Causal machine learning holds promise for estimating individual treatment effects from complex data.<n>We propose PermuCATE, an algorithm based on the Conditional Permutation (CPI) method.<n>We empirically demonstrate the benefits of PermuCATE in simulated and real-world health datasets.
arXiv Detail & Related papers (2024-08-23T11:44:07Z)
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions. We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z)
Data-free Weight Compress and Denoise for Large Language Models [96.68582094536032]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.<n>We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z)
Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent. We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed. Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z)
Proximal Symmetric Non-negative Latent Factor Analysis: A Novel Approach to Highly-Accurate Representation of Undirected Weighted Networks [2.1797442801107056]
Undirected Weighted Network (UWN) is commonly found in big data-related applications. Existing models fail in either modeling its intrinsic symmetry or low-data density. Proximal Symmetric Nonnegative Latent-factor-analysis model is proposed.
arXiv Detail & Related papers (2023-06-06T13:03:24Z)
Language model compression with weighted low-rank factorization [73.61874728240568]
We introduce Fisher information to weigh the importance of parameters affecting the model prediction. We find that our resulting task accuracy is much closer to the original model's performance. Our method can directly compress a task-specific model while achieving better performance than other compact model strategies.
arXiv Detail & Related papers (2022-06-30T21:57:07Z)
Estimating Average Treatment Effects with Support Vector Machines [77.34726150561087]
Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature. We adapt SVM as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups. We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods.
arXiv Detail & Related papers (2021-02-23T20:22:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.