Related papers: Layer-Aware Embedding Fusion for LLMs in Text Classifications

Layer-Aware Embedding Fusion for LLMs in Text Classifications

URL: http://arxiv.org/abs/2504.05764v1
Date: Tue, 08 Apr 2025 07:45:50 GMT
Title: Layer-Aware Embedding Fusion for LLMs in Text Classifications
Authors: Jiho Gwak, Yuchul Jung,
Abstract summary: We propose a layer-aware embedding selection method and investigate how to quantitatively evaluate different layers to identify the most important ones for downstream NLP tasks.<n>Experiments on four English text classification datasets demonstrate that different layers in LLMs exhibit varying degrees of representational strength for classification.<n>We also explore how combining embeddings from multiple LLMs, without requiring model fine-tuning, can improve performance.
Score: 1.4250487522292254
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Embedding fusion has emerged as an effective approach for enhancing performance across various NLP tasks. However, systematic guidelines for selecting optimal layers and developing effective fusion strategies for the integration of LLMs remain underexplored. In this study, we propose a layer-aware embedding selection method and investigate how to quantitatively evaluate different layers to identify the most important ones for downstream NLP tasks, showing that the critical layers vary depending on the dataset. We also explore how combining embeddings from multiple LLMs, without requiring model fine-tuning, can improve performance. Experiments on four English text classification datasets (SST-2, MR, R8, and R52) demonstrate that different layers in LLMs exhibit varying degrees of representational strength for classification, and that combining embeddings from different models can enhance performance if the models exhibit complementary characteristics. Additionally, we discuss resources overhead (memory and inference time) to provide a balanced perspective on the real world feasibility of embedding fusion. Future work will explore multilingual and domain specific datasets, as well as techniques for automating layer selection, to improve both performance and scalability.

Related papers

Rethinking Visual Layer Selection in Multimodal LLMs [46.091556112958884]
This work proposes a Layer-wise Similarity approach to group CLIP-ViT layers with similar behaviors into shallow, middle, and deep categories. We revisit the visual layer selection problem in MLLMs at scale, training LLaVA-style models ranging from 1.4B to 7B parameters. We find that: (1) deep layers are essential for OCR tasks; (2) shallow and middle layers substantially outperform deep layers on reasoning tasks involving counting, positioning, and object localization; and (3) a lightweight fusion of features across shallow, middle, and deep layers consistently outperforms specialized fusion baselines and single-
arXiv Detail & Related papers (2025-04-30T09:07:10Z)
Distilling Transitional Pattern to Large Language Models for Multimodal Session-based Recommendation [67.84581846180458]
Session-based recommendation (SBR) predicts the next item based on anonymous sessions. Recent Multimodal SBR methods utilize simplistic pre-trained models for modality learning but have limitations in semantic richness. We propose a multimodal LLM-enhanced framework TPAD, which extends a distillation paradigm to decouple and align transitional patterns for promoting MSBR.
arXiv Detail & Related papers (2025-04-13T07:49:08Z)
Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models [50.98559225639266]
We investigate the contributions of visual features from different encoder layers using 18 benchmarks spanning 6 task categories.<n>Our findings reveal that multilayer features provide complementary strengths with varying task dependencies, and uniform fusion leads to suboptimal performance.<n>We propose the instruction-guided vision aggregator, a module that dynamically integrates multi-layer visual features based on textual instructions.
arXiv Detail & Related papers (2024-12-26T05:41:31Z)
FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data [56.08867996209236]
Fine-tuning Multimodal Large Language Models (MLLMs) with Federated Learning (FL) allows for expanding the training data scope by including private data sources.<n>We introduce a benchmark to evaluate the performance of federated fine-tuning of MLLMs across various multimodal heterogeneous scenarios.<n>We develop a general FedMLLM framework that integrates classic FL methods alongside two modality-agnostic strategies.
arXiv Detail & Related papers (2024-11-22T04:09:23Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
F$^3$OCUS -- Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics [8.577808901433]
We show the impact of two factors textitviz., client-specific layer importance score that selects the most important VLM layers for fine-tuning.<n>We propose a novel layer updating strategy dubbed F$3$OCUS that jointly optimize the layer importance and diversity factors.
arXiv Detail & Related papers (2024-11-17T21:54:57Z)
AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis [5.854247492297834]
We propose a novel metric combining normalized activation variance and sparsity to assess each layer's contribution to model performance. By identifying and removing approximately the lowest 25% of layers based on AVSS, we achieve over 90% of original model performance.
arXiv Detail & Related papers (2024-11-04T14:29:49Z)
Exploring Selective Layer Fine-Tuning in Federated Learning [48.470385357429215]
Federated learning (FL) has emerged as a promising paradigm for fine-tuning foundation models using distributed data. We study selective layer fine-tuning in FL, emphasizing a flexible approach that allows the clients to adjust their selected layers according to their local data and resources.
arXiv Detail & Related papers (2024-08-28T07:48:39Z)
Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models [55.45444773200529]
Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination. Recent work has focused on decoding techniques to improve factuality during inference.
arXiv Detail & Related papers (2024-04-14T19:45:35Z)
Learning the Right Layers: a Data-Driven Layer-Aggregation Strategy for Semi-Supervised Learning on Multilayer Graphs [2.752817022620644]
Clustering (or community detection) on multilayer graphs poses several additional complications. One of the major challenges is to establish the extent to which each layer contributes to the cluster iteration assignment. We propose a parameter-free Laplacian-regularized model that learns an optimal nonlinear combination of the different layers from the available input labels.
arXiv Detail & Related papers (2023-05-31T19:50:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.