Related papers: "A 6 or a 9?": Ensemble Learning Through the Multiplicity of Performant Models and Explanations

"A 6 or a 9?": Ensemble Learning Through the Multiplicity of Performant Models and Explanations

URL: http://arxiv.org/abs/2509.09073v2
Date: Sun, 12 Oct 2025 05:29:28 GMT
Title: "A 6 or a 9?": Ensemble Learning Through the Multiplicity of Performant Models and Explanations
Authors: Gianlucca Zuin, Adriano Veloso,
Abstract summary: Rashomon Effect refers to cases where multiple models perform similarly well for a given learning problem.<n>We propose the Rashomon Ensemble, a method that strategically selects models from these diverse high-performing solutions to improve generalization.<n>We validate our approach on both open and proprietary collaborative real-world datasets, demonstrating up to 0.20+ AUROC improvements in scenarios where the Rashomon ratio is large.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating models from past observations and ensuring their effectiveness on new data is the essence of machine learning. However, selecting models that generalize well remains a challenging task. Related to this topic, the Rashomon Effect refers to cases where multiple models perform similarly well for a given learning problem. This often occurs in real-world scenarios, like the manufacturing process or medical diagnosis, where diverse patterns in data lead to multiple high-performing solutions. We propose the Rashomon Ensemble, a method that strategically selects models from these diverse high-performing solutions to improve generalization. By grouping models based on both their performance and explanations, we construct ensembles that maximize diversity while maintaining predictive accuracy. This selection ensures that each model covers a distinct region of the solution space, making the ensemble more robust to distribution shifts and variations in unseen data. We validate our approach on both open and proprietary collaborative real-world datasets, demonstrating up to 0.20+ AUROC improvements in scenarios where the Rashomon ratio is large. Additionally, we demonstrate tangible benefits for businesses in various real-world applications, highlighting the robustness, practicality, and effectiveness of our approach.

Related papers

Improved visual-information-driven model for crowd simulation and its modular application [10.677565659375832]
Data-driven crowd simulation models offer advantages in enhancing the accuracy and realism of simulations.<n>It is still an open question to develop data-driven crowd simulation models with strong generalizibility.<n>This paper proposes a data-driven model incorporating a refined visual information extraction method and exit cues to enhance generalizability.
arXiv Detail & Related papers (2025-04-02T07:53:33Z)
Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data [54.3895971080712]
Fine-tuning large language models (LLMs) using diverse datasets is crucial for enhancing their overall performance across various domains.<n>We propose a new method that gives the LLM a dual identity: an output model to cognitively probe and select data based on diversity reward, as well as an input model to be tuned with the selected data.
arXiv Detail & Related papers (2025-02-05T17:21:01Z)
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains [114.76612918465948]
Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data.<n>We propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models.
arXiv Detail & Related papers (2025-01-10T04:35:46Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models. We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z)
$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization [1.6958018695660049]
We show that $textbfonly emerges$ when training data is diversified enough across semantic domains. We extend our analysis to real-world scenarios, including fine-tuning of $textit$textbfspecialist$$ and $textit$textbfgeneralist$$ models.
arXiv Detail & Related papers (2024-10-07T03:15:11Z)
Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models [45.51085356985464]
Large language models (LLMs) are typically fine-tuned on diverse and extensive datasets sourced from various origins. MoS learns to optimize data usage automatically during the fine-tuning process. MoSpec harnesses the utilities of various datasets for a specific purpose.
arXiv Detail & Related papers (2024-06-13T05:01:28Z)
Data-Free Diversity-Based Ensemble Selection For One-Shot Federated Learning in Machine Learning Model Market [2.9046424358155236]
We present a novel Data-Free Diversity-Based method called DeDES to address the ensemble selection problem for models generated by one-shot federated learning. Our method can achieve both better performance and higher efficiency over 5 datasets and 4 different model structures.
arXiv Detail & Related papers (2023-02-23T02:36:27Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [47.432215933099016]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.<n>This creates a barrier to fusing knowledge across individual models to yield a better single model.<n>We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set [50.67431815647126]
Post-hoc global/local feature attribution methods are being progressively employed to understand machine learning models. We show that partial orders of local/global feature importance arise from this methodology. We show that every relation among features present in these partial orders also holds in the rankings provided by existing approaches.
arXiv Detail & Related papers (2021-10-26T02:53:14Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.