Related papers: Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing

Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing

URL: http://arxiv.org/abs/2407.15580v2
Date: Thu, 31 Oct 2024 12:59:49 GMT
Title: Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
Authors: David Perera, Victor Letzelter, Théo Mariotte, Adrien Cortés, Mickael Chen, Slim Essid, Gaël Richard,
Abstract summary: We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. We validate our algorithm by extensive experiments on synthetic datasets, on the standard UCI benchmark, and on speech separation.
Score: 13.307920993909724
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. These hypotheses are trained using the Winner-takes-all (WTA) scheme, which promotes the diversity of the predictions. However, this scheme may converge toward an arbitrarily suboptimal local minimum, due to the greedy nature of WTA. We overcome this limitation using annealing, which enhances the exploration of the hypothesis space during training. We leverage insights from statistical physics and information theory to provide a detailed description of the model training trajectory. Additionally, we validate our algorithm by extensive experiments on synthetic datasets, on the standard UCI benchmark, and on speech separation.

Related papers

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z)
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs? [32.04523360747506]
We construct a dataset using 50 1B parameter LLM variants with systematically varied pre-training configurations. We introduce novel unsupervised and supervised proxy metrics derived from pre-training that successfully reduce the relative performance prediction error rate by over 50%.
arXiv Detail & Related papers (2025-04-16T21:19:09Z)
What Makes In-context Learning Effective for Mathematical Reasoning: A Theoretical Analysis [81.15503859645149]
In this paper, we aim to theoretically analyze the impact of in-context demonstrations on large language models' reasoning performance. We propose a straightforward, generalizable, and low-complexity demonstration selection method named LMS3.
arXiv Detail & Related papers (2024-12-11T11:38:11Z)
Bayesian scaling laws for in-context learning [72.17734205418502]
In-context learning (ICL) is a powerful technique for getting language models to perform complex tasks with no training updates. We show that ICL approximates a Bayesian learner and develop a family of novel Bayesian scaling laws for ICL.
arXiv Detail & Related papers (2024-10-21T21:45:22Z)
Annealed Winner-Takes-All for Motion Forecasting [48.200282332176094]
We show how an aWTA loss can be integrated with state-of-the-art motion forecasting models to enhance their performance. Our approach can be easily incorporated into any trajectory prediction model normally trained using WTA.
arXiv Detail & Related papers (2024-09-17T13:26:17Z)
Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z)
Latent Alignment with Deep Set EEG Decoders [44.128689862889715]
We introduce the Latent Alignment method that won the Benchmarks for EEG Transfer Learning competition. We present its formulation as a deep set applied on the set of trials from a given subject. Our experimental results show that performing statistical distribution alignment at later stages in a deep learning model is beneficial to the classification accuracy.
arXiv Detail & Related papers (2023-11-29T12:40:45Z)
Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis [8.896068269039452]
We introduce Resilient Multiple Choice Learning (rMCL) for conditional distribution estimation in regression settings. rMCL is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses.
arXiv Detail & Related papers (2023-11-02T07:54:03Z)
CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances. We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z)
Task-Free Continual Learning via Online Discrepancy Distance Learning [11.540150938141034]
This paper develops a new theoretical analysis framework which provides generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model. Inspired by this theoretical model, we propose a new approach enabled by the dynamic component expansion mechanism for a mixture model, namely the Online Discrepancy Distance Learning (ODDL)
arXiv Detail & Related papers (2022-10-12T20:44:09Z)
Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions [24.389388509299543]
Implicit Maximum Likelihood Estimation is a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. We show that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.
arXiv Detail & Related papers (2021-06-03T12:42:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.