LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection
- URL: http://arxiv.org/abs/2601.21082v1
- Date: Wed, 28 Jan 2026 22:09:42 GMT
- Title: LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection
- Authors: Shivam Patel, William Cocke, Gauri Joshi,
- Abstract summary: We propose LOCUS, a method that produces low-dimensional vector embeddings that compactly represent a language model's capabilities across queries.<n>LOCUS is an attention-based approach that generates embeddings by a deterministic forward pass over query encodings and evaluation scores via an encoder model.<n>We train a correctness predictor that uses model embeddings and query encodings to achieve state-of-the-art routing accuracy on unseen queries.
- Score: 15.182368486530128
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The rapidly growing ecosystem of Large Language Models (LLMs) makes it increasingly challenging to manage and utilize the vast and dynamic pool of models effectively. We propose LOCUS, a method that produces low-dimensional vector embeddings that compactly represent a language model's capabilities across queries. LOCUS is an attention-based approach that generates embeddings by a deterministic forward pass over query encodings and evaluation scores via an encoder model, enabling seamless incorporation of new models to the pool and refinement of existing model embeddings without having to perform any retraining. We additionally train a correctness predictor that uses model embeddings and query encodings to achieve state-of-the-art routing accuracy on unseen queries. Experiments show that LOCUS needs up to 4.8x fewer query evaluation samples than baselines to produce informative and robust embeddings. Moreover, the learned embedding space is geometrically meaningful: proximity reflects model similarity, enabling a range of downstream applications including model comparison and clustering, model portfolio selection, and resilient proxies of unavailable models.
Related papers
- HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions [50.61510609116118]
HuggingR$4$ is a novel framework that combines Reasoning, Retrieval, Refinement, and Reflection to efficiently select models.<n>It attains a workability rate of 92.03% and a reasonability rate of 82.46%, surpassing existing method by 26.51% and 33.25% respectively.
arXiv Detail & Related papers (2025-11-24T03:13:45Z) - Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z) - Model Correlation Detection via Random Selection Probing [62.093777777813756]
Existing similarity-based methods require access to model parameters or produce scores without thresholds.<n>We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test.<n>RSP produces rigorous p-values that quantify evidence of correlation.
arXiv Detail & Related papers (2025-09-29T01:40:26Z) - Black-box Model Merging for Language-Model-as-a-Service with Massive Model Repositories [21.899117703417517]
We propose a derivative-free optimization framework based on the evolutionary algorithm (Evo-Merging)<n>Our method consists of two key components: (1) sparsity-based denoising, designed to identify and filter out irrelevant or redundant information across models, and (2) sign-aware scaling, which dynamically computes optimal combination weights for the relevant models based on their performance.<n>Our approach achieves state-of-the-art results on a range of tasks, significantly outperforming existing strong baselines.
arXiv Detail & Related papers (2025-09-16T10:55:50Z) - CP-Bench: Evaluating Large Language Models for Constraint Modelling [6.250460397062786]
Constraint programming (CP) is widely used to solve problems, but its core process, namely constraint modelling, requires significant expertise and is considered to be a bottleneck for wider adoption.<n>Recent studies have explored using Large Language Models (LLMs) to transform problem descriptions into executable constraint models.<n>Existing evaluation datasets for constraint modelling are often limited to small, homogeneous, or domain-specific instances, which do not capture the diversity of real-world scenarios.<n>This work addresses this gap by introducing CP-Bench, a novel benchmark that includes a diverse set of well-known problems sourced from the CP community, structured
arXiv Detail & Related papers (2025-06-06T12:56:02Z) - OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging [124.91183814854126]
Model merging seeks to combine multiple expert models into a single model.<n>We introduce a benchmark for model merging research that clearly divides the tasks for MLLM training and evaluation.<n>We find that model merging offers a promising way for building improved MLLMs without requiring training data.
arXiv Detail & Related papers (2025-05-26T12:23:14Z) - Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - Exploring Query Efficient Data Generation towards Data-free Model Stealing in Hard Label Setting [38.755154033324374]
Data-free model stealing involves replicating the functionality of a target model into a substitute model without accessing the target model's structure, parameters, or training data.<n>This paper presents a new data-free model stealing approach called Query Efficient Data Generation (textbfQEDG)<n>We introduce two distinct loss functions to ensure the generation of sufficient samples that closely and uniformly align with the target model's decision boundary.
arXiv Detail & Related papers (2024-12-18T03:03:15Z) - Exploring Model Kinship for Merging Large Language Models [73.98345036483299]
We study model evolution through iterative merging, drawing an analogy to biological evolution.<n>We show that model kinship is closely linked to the performance improvements achieved by merging.<n>We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - Earning Extra Performance from Restrictive Feedbacks [41.05874087063763]
We set up a challenge named emphEarning eXtra PerformancE from restriCTive feEDdbacks (EXPECTED) to describe this form of model tuning problems.
The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks.
We propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution.
arXiv Detail & Related papers (2023-04-28T13:16:54Z) - Model Reuse with Reduced Kernel Mean Embedding Specification [70.044322798187]
We present a two-phase framework for finding helpful models for a current application.
In the upload phase, when a model is uploading into the pool, we construct a reduced kernel mean embedding (RKME) as a specification for the model.
Then in the deployment phase, the relatedness of the current task and pre-trained models will be measured based on the value of the RKME specification.
arXiv Detail & Related papers (2020-01-20T15:15:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.