Relative Overfitting and Accept-Reject Framework
- URL: http://arxiv.org/abs/2505.07783v4
- Date: Mon, 07 Jul 2025 13:56:43 GMT
- Title: Relative Overfitting and Accept-Reject Framework
- Authors: Yanxin Liu, Yunqi Zhang,
- Abstract summary: We propose an ensemble framework that governs how models are segmented to ensure performance improvement.<n>We detail the patterns of this framework within the domain of NLP and briefly describe its to other fields, such as computer vision (CV) and AI for science.
- Score: 5.465098504510676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The scaling of Large Language Models (LLMs) currently faces significant challenges. Model assembly is widely considered a promising solution to break through these performance bottlenecks. However, current ensembling methods are primarily guided by the statistical expectation that combining multiple models over large samples will lead to performance gains. We propose an ensemble framework that transitions from such stochastic, sample-dependent methods to a regular, controllable approach based on fine-grained model segmentation. This regularity governs how models are segmented to ensure performance improvement, how the magnitude of this improvement varies with model selection, and what factors determine its theoretical maximum. To formalize this pattern, we introduce the concept of'relative overfitting,' which is derived from the performance discrepancies between constituent models and builds a bridge between ensemble outcomes and the inherent attributes of these models. We detail the patterns of this framework within the domain of NLP and briefly describe its extensibility to other fields, such as computer vision (CV) and AI for science. Our approach was validated using both custom-built and pre-trained mainstream models across diverse benchmarks, including language modeling, long-context tasks, and question-answering (QA). The results indicate that the ensemble rules we proposed are generally effective and that we provide a rigorous proof of these rules in certain experimental scenarios. The proposed framework offers a new perspective for understanding ensemble theory and provides a systematic approach to addressing the performance bottlenecks of LLMs.
Related papers
- Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models [50.19188692497892]
Traditional alignment methods often require retraining large pretrained models.<n>We propose a novel textitResidual Alignment Model (textitRAM) that formalizes the alignment process as a type of importance sampling.<n>We develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods.
arXiv Detail & Related papers (2025-05-26T08:53:02Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward [10.914414815406275]
Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities.<n>Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions.<n>We present novel empirical findings that offer insights on how to effectively utilize the SLAM-ASR architecture.
arXiv Detail & Related papers (2024-11-06T12:22:04Z) - Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling [23.447466392929712]
Large language models (LLMs) exhibit varying strengths and weaknesses across different tasks.<n>Existing LLM ensembling methods often overlook model compatibility and struggle with inefficient alignment of probabilities.<n>We introduce the textscUnion textscTop-$k$ textscEnsembling (textscUniTE), a novel approach that efficiently combines models by focusing on the union of the top-k tokens from each model.
arXiv Detail & Related papers (2024-10-03T08:42:38Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Can formal argumentative reasoning enhance LLMs performances? [0.3659498819753633]
We present a pipeline (MQArgEng) to evaluate the effect of introducing computational argumentation semantics on the performance of Large Language Models (LLMs)
Exploratory results indicate that MQArgEng provides a moderate performance gain in most of the examined topical categories and, as such, show promise and warrant further research.
arXiv Detail & Related papers (2024-05-16T22:09:31Z) - Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns.
We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions.
Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z) - Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble [11.542472900306745]
Multi-Comprehension (MC) Ensemble is proposed as a strategy to augment the Out-of-Distribution (OOD) feature representation field.
Our experimental results demonstrate the superior performance of the MC Ensemble strategy in OOD detection.
This underscores the effectiveness of our proposed approach in enhancing the model's capability to detect instances outside its training distribution.
arXiv Detail & Related papers (2024-03-24T18:43:04Z) - CogBench: a large language model walks into a psychology lab [12.981407327149679]
This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments.
We apply CogBench to 35 large language models (LLMs) and analyze this data using statistical multilevel modeling techniques.
We find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior.
arXiv Detail & Related papers (2024-02-28T10:43:54Z) - Revisiting Demonstration Selection Strategies in In-Context Learning [66.11652803887284]
Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL)
In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.
We propose a data- and model-dependent demonstration selection method, textbfTopK + ConE, based on the assumption that textitthe performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples.
arXiv Detail & Related papers (2024-01-22T16:25:27Z) - CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
Full Large Language Model [22.870512676002463]
This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators.
Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs.
Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT.
arXiv Detail & Related papers (2023-10-24T03:08:58Z) - Beyond Task Performance: Evaluating and Reducing the Flaws of Large
Multimodal Models with In-Context Learning [105.77733287326308]
We evaluate 10 recent open-source LMMs from 3B up to 80B parameter scale, on 5 different axes; hallucinations, abstention, compositionality, explainability and instruction following.
We explore the training-free in-context learning (ICL) as a solution, and study how it affects these limitations.
Based on our ICL study, (3) we push ICL further and propose new multimodal ICL variants such as; Multitask-ICL, Chain-of-Hindsight-ICL, and Self-Correcting-ICL.
arXiv Detail & Related papers (2023-10-01T12:02:59Z) - Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals.
Model-to-Match uses variable importance measurements to construct a distance metric.
We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Distributional Depth-Based Estimation of Object Articulation Models [21.046351215949525]
We propose a method that efficiently learns distributions over articulation model parameters directly from depth images.
Our core contributions include a novel representation for distributions over rigid body transformations.
We introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation.
arXiv Detail & Related papers (2021-08-12T17:44:51Z) - Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation.
We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.
We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.