Related papers: Evaluating Embedding Generalization: How LLMs, LoRA, and SLERP Shape Representational Geometry

Evaluating Embedding Generalization: How LLMs, LoRA, and SLERP Shape Representational Geometry

URL: http://arxiv.org/abs/2511.21703v1
Date: Sun, 16 Nov 2025 17:28:06 GMT
Title: Evaluating Embedding Generalization: How LLMs, LoRA, and SLERP Shape Representational Geometry
Authors: Siyaxolisa Kabane,
Abstract summary: We study the extent to which spherical linear (SLERP) model-merging mitigates over-specialization introduced by task-specific adaptation.<n>We compare four families of models: non-LLM encoders trained from scratch or fine-tuned for embeddings, LLM-based encoders adapted with parameter-efficient methods (LoRA), LLM-based encoders with LoRA followed by model souping merging into the base weights, and the same LoRA-adapted LLMs merged using SLERP across checkpoints or stages.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the generalization properties of dense text embeddings when the embedding backbone is a large language model (LLM) versus when it is a non-LLM encoder, and we study the extent to which spherical linear interpolation (SLERP) model-merging mitigates over-specialization introduced by task-specific adaptation (e.g., LoRA). To make the comparison concrete and domain-agnostic, we design a controlled suite of experiments in which models embed short numerical sequences and are evaluated on their ability to cluster and classify those sequences according to well-defined number-theoretic properties. Our experimental protocol compares four families of models: (1) non-LLM encoders trained from scratch or fine-tuned for embeddings, (2) LLM-based encoders adapted with parameter-efficient methods (LoRA), (3) LLM-based encoders with LoRA followed by model souping merging into the base weights, and (4) the same LoRA-adapted LLMs merged using SLERP across checkpoints or stages. We evaluate representational quality with clustering indices (Silhouette and Davies Bouldin). We additionally analyze the use of kmeans labels to see if the embeddings encode any other information besides the one we are testing for. Empirically, we find that LLM-based backbones produce embeddings that better capture higher-order, compositional numeric patterns, but are prone to adapter dominance that degrades balanced generalization; SLERP merging consistently recovers base-model structure while retaining most task gains, yielding superior tradeoffs in clustering separability, and robustness compared to model souping or models that were not merged.

Related papers

Large Multimodal Models as General In-Context Classifiers [73.11242790834383]
In this work, we argue that this answer overlooks an important capability of LMMs: in-context learning.<n>We benchmark state-of-the-art LMMs on diverse datasets for closed-world classification and find that, although their zero-shot performance is lower than CLIP's, LMMs with a few in-context examples can match or even surpass contrastive VLMs with cache-based adapters.<n>We extend this analysis to the open-world setting, where the generative nature of LMMs makes them more suitable for the task.
arXiv Detail & Related papers (2026-02-26T17:08:18Z)
Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches [0.0]
We explore strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints.<n>Two approaches are investigated: (1) attaching a classification head to a pre-trained causal LLM and fine-tuning on the task, and (2) instruction-tuning the LLM in a prompt->response format for classification.
arXiv Detail & Related papers (2025-12-14T13:02:06Z)
Improving LLM-based Ontology Matching with fine-tuning on synthetic data [0.0]
Large Language Models (LLMs) are increasingly being integrated into various components of Ontology Matching pipelines.<n>This paper investigates the capability of LLMs to perform ontology matching directly on ontology modules and generate the corresponding alignments.<n>A dedicated fine-tuning strategy can enhance the model's matching performance in a zero-shot setting.
arXiv Detail & Related papers (2025-11-27T16:46:45Z)
SparseRM: A Lightweight Preference Modeling with Sparse Autoencoder [54.31950189922548]
Reward models (RMs) are proxies for human preference evaluation and guiding model alignment.<n>We propose SparseRM, which leverages Sparse Autoencoder (SAE) to extract preference-relevant information encoded in model representations.<n>SparseRM achieves superior performance over most mainstream RMs while using less than 1% of trainable parameters.
arXiv Detail & Related papers (2025-11-11T06:51:56Z)
Scaling Sparse and Dense Retrieval in Decoder-Only LLMs [20.173669986209024]
Scaling large language models (LLMs) has shown great potential for improving retrieval model performance.<n>Previous studies have mainly focused on dense retrieval trained with contrastive loss (CL)<n>Sparse retrieval models consistently outperform dense retrieval across both in-domain (MSMARCO, TREC DL) and out-of-domain (BEIR) benchmarks.
arXiv Detail & Related papers (2025-02-21T15:28:26Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs)<n>We find that fine-tuning text embedding models on LLM-generated texts yields excellent classification accuracy.<n>We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods.<n>In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z)
Rational Tuning of LLM Cascades via Probabilistic Modeling [0.9208007322096532]
We present a probabilistic model for the joint performance distribution of a sequence of large language models (LLMs)<n>Compared to selecting confidence thresholds using Bayesian optimization, our Markov parametric-copula model yields more favorable error-cost trade-offs.<n>Our framework's inductive assumptions about the interactions between the error rates of different LLMs enhance sample efficiency.
arXiv Detail & Related papers (2025-01-16T07:58:33Z)
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild [84.57103623507082]
This paper introduces Model-GLUE, a holistic Large Language Models scaling guideline.<n>We benchmark existing scaling techniques, especially selective merging, and variants of mixture.<n>We then formulate an optimal strategy for the selection and aggregation of a heterogeneous model zoo.<n>Our methodology involves the clustering of mergeable models and optimal merging strategy selection, and the integration of clusters.
arXiv Detail & Related papers (2024-10-07T15:55:55Z)
Latent Space Perspicacity and Interpretation Enhancement (LS-PIE) Framework [0.0]
This paper proposes a general framework to enhance latent space representations for improving interpretability of linear latent spaces. Although the concepts in this paper are language agnostic, the framework is written in Python. Several innovative enhancements are incorporated including latent ranking (LR), latent scaling (LS), latent clustering (LC), and latent condensing (LCON)
arXiv Detail & Related papers (2023-07-11T03:56:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.