Related papers: AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis

AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis

URL: http://arxiv.org/abs/2411.02117v1
Date: Mon, 04 Nov 2024 14:29:49 GMT
Title: AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis
Authors: Zichen Song, Yuxin Wu, Sitan Huang, Zhongfeng Kang,
Abstract summary: We propose a novel metric combining normalized activation variance and sparsity to assess each layer's contribution to model performance. By identifying and removing approximately the lowest 25% of layers based on AVSS, we achieve over 90% of original model performance.
Score: 5.854247492297834
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The evaluation of layer importance in deep learning has been an active area of research, with significant implications for model optimization and interpretability. Recently, large language models (LLMs) have gained prominence across various domains, yet limited studies have explored the functional importance and performance contributions of individual layers within LLMs, especially from the perspective of activation distribution. In this work, we propose the Activation Variance-Sparsity Score (AVSS), a novel metric combining normalized activation variance and sparsity to assess each layer's contribution to model performance. By identifying and removing approximately the lowest 25% of layers based on AVSS, we achieve over 90% of original model performance across tasks such as question answering, language modeling, and sentiment classification, indicating that these layers may be non-essential. Our approach provides a systematic method for identifying less critical layers, contributing to efficient large language model architectures.

Related papers

Ensemble Bayesian Inference: Leveraging Small Language Models to Achieve LLM-level Accuracy in Profile Matching Tasks [0.0]
This study explores the potential of small language model(SLM) ensembles to achieve accuracy comparable to proprietary large language models (LLMs) We propose Ensemble Bayesian Inference (EBI), a novel approach that applies Bayesian estimation to combine judgments from multiple SLMs.
arXiv Detail & Related papers (2025-04-24T15:55:10Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks. We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
Layer-Aware Embedding Fusion for LLMs in Text Classifications [1.4250487522292254]
We propose a layer-aware embedding selection method and investigate how to quantitatively evaluate different layers to identify the most important ones for downstream NLP tasks. Experiments on four English text classification datasets demonstrate that different layers in LLMs exhibit varying degrees of representational strength for classification. We also explore how combining embeddings from multiple LLMs, without requiring model fine-tuning, can improve performance.
arXiv Detail & Related papers (2025-04-08T07:45:50Z)
Integration of Explainable AI Techniques with Large Language Models for Enhanced Interpretability for Sentiment Analysis [0.5120567378386615]
Interpretability remains a key difficulty in sentiment analysis with Large Language Models (LLMs) This research introduces a technique that applies SHAP (Shapley Additive Explanations) by breaking down LLMs into components such as embedding layer,encoder,decoder and attention layer. The method is evaluated using the Stanford Sentiment Treebank (SST-2) dataset, which shows how different sentences affect different layers.
arXiv Detail & Related papers (2025-03-15T01:37:54Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities. LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands. We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [97.94579295913606]
Multimodal Large Language Models (MLLMs) have garnered increased attention from both industry and academia. In the development process, evaluation is critical since it provides intuitive feedback and guidance on improving models. This work aims to offer researchers an easy grasp of how to effectively evaluate MLLMs according to different needs and to inspire better evaluation methods.
arXiv Detail & Related papers (2024-11-22T18:59:54Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
Layer Importance and Hallucination Analysis in Large Language Models via Enhanced Activation Variance-Sparsity [5.854247492297834]
This paper first explores layer importance using the Activation Variance-Sparsity Score (AVSS) Building on AVSS, we propose an enhanced version tailored to assess hallucination propensity across layers (EAVSS) This improved approach introduces Hallucination-Specific Activation Variance (HSAV) and Hallucination-Specific Sparsity (HSS) metrics, allowing precise identification of hallucination-prone layers.
arXiv Detail & Related papers (2024-11-15T09:33:47Z)
Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning [15.919493497867567]
This study aims to evaluate the performance of Multimodal Large Language Models (MLLMs) on the VALSE benchmark. We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets.
arXiv Detail & Related papers (2024-07-17T11:26:47Z)
A Survey on Efficient Inference for Large Language Models [25.572035747669275]
Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. The substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. This paper presents a comprehensive survey of the existing literature on efficient LLM inference.
arXiv Detail & Related papers (2024-04-22T15:53:08Z)
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics [51.17512229589]
PoLLMgraph is a model-based white-box detection and forecasting approach for large language models. We show that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics. Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
arXiv Detail & Related papers (2024-04-06T20:02:20Z)
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z)
Unveiling the Generalization Power of Fine-Tuned Large Language Models [81.70754292058258]
We investigate whether fine-tuning affects the intrinsic generalization ability intrinsic to Large Language Models (LLMs) Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks. We observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability.
arXiv Detail & Related papers (2024-03-14T08:18:59Z)
F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic. For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z)
Comparative layer-wise analysis of self-supervised speech models [29.258085176788097]
We measure acoustic, phonetic, and word-level properties encoded in individual layers, using a lightweight analysis tool based on canonical correlation analysis (CCA) We find that these properties evolve across layers differently depending on the model, and the variations relate to the choice of pre-training objective. We discover that CCA trends provide reliable guidance to choose layers of interest for downstream tasks and that single-layer performance often matches or improves upon using all layers, suggesting implications for more efficient use of pre-trained models.
arXiv Detail & Related papers (2022-11-08T00:59:05Z)
Boosting the Generalization Capability in Cross-Domain Few-shot Learning via Noise-enhanced Supervised Autoencoder [23.860842627883187]
We teach the model to capture broader variations of the feature distributions with a novel noise-enhanced supervised autoencoder (NSAE) NSAE trains the model by jointly reconstructing inputs and predicting the labels of inputs as well as their reconstructed pairs. We also take advantage of NSAE structure and propose a two-step fine-tuning procedure that achieves better adaption and improves classification performance in the target domain.
arXiv Detail & Related papers (2021-08-11T04:45:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.