Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
- URL: http://arxiv.org/abs/2603.01771v2
- Date: Tue, 03 Mar 2026 12:35:37 GMT
- Title: Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
- Authors: Harry Amad, Mihaela van der Schaar,
- Abstract summary: Post-deployment, user preferences can evolve, making initial settings undesirable.<n>We learn, from observed data, how a NN's conditional output distribution changes with its hyper parameters.<n>We construct a surrogate model that approximates the NN at unobserved hyper parameters.
- Score: 51.56484100374058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks (NNs) often have critical behavioural trade-offs that are set at design time with hyperparameters-such as reward weights in reinforcement learning or quantile targets in regression. Post-deployment, however, user preferences can evolve, making initial settings undesirable, necessitating potentially expensive retraining. To circumvent this, we introduce the task of Hyperparameter Trajectory Inference (HTI): to learn, from observed data, how a NN's conditional output distribution changes with its hyperparameters, and construct a surrogate model that approximates the NN at unobserved hyperparameter settings. HTI requires extending existing trajectory inference approaches to incorporate conditions, exacerbating the challenge of ensuring inferred paths are feasible. We propose an approach based on conditional Lagrangian optimal transport, jointly learning the Lagrangian function governing hyperparameter-induced dynamics along with the associated optimal transport maps and geodesics between observed marginals, which form the surrogate model. We incorporate inductive biases based on the manifold hypothesis and least-action principles into the learned Lagrangian, improving surrogate model feasibility. We empirically demonstrate that our approach reconstructs NN outputs across various hyperparameter spectra better than other alternatives.
Related papers
- High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning [57.85676271833619]
Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning.<n>We present textbfSMoA, a high-rank textbfStructured textbfMOdulation textbfAdapter that uses fewer trainable parameters while maintaining a higher rank.
arXiv Detail & Related papers (2026-01-12T13:06:17Z) - How to Set the Learning Rate for Large-Scale Pre-training? [73.03133634525635]
We formalize this investigation into two distinct research paradigms: Fitting and Transfer.<n>Within the Fitting Paradigm, we introduce a Scaling Law for search factor, effectively reducing the search complexity from O(n3) to O(n*C_D*C_) via predictive modeling.<n>We extend the principles of $$Transfer to the Mixture of Experts (MoE) architecture, broadening its applicability to encompass model depth, weight decay, and token horizons.
arXiv Detail & Related papers (2026-01-08T15:55:13Z) - Generative Bayesian Hyperparameter Tuning [0.0]
Cross-validation is often computationally prohibitive at scale, while fully Bayesian hyper- parameter learning can be difficult due to the cost of posterior sampling.<n>We develop a generative perspective that combines two ideas: (i) optimization-based approximations to Bayesian posteriors via randomized, weighted objectives (weighted Bayesian bootstrap) and (ii) repeated optimization across many hyper- parameter settings.
arXiv Detail & Related papers (2025-12-23T05:00:52Z) - Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems [3.3811247908085855]
We present a framework for fine-tuning flow-matching generative models to enforce physical constraints and solve inverse problems in scientific systems.<n>Our approach bridges generative modelling and scientific inference, opening new avenues for simulation-augmented discovery and data-efficient modelling of physical systems.
arXiv Detail & Related papers (2025-08-05T09:32:04Z) - Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z) - Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework [3.0617189749929348]
We propose AdaInit to mitigate barren plateaus (BPs) in quantum neural networks (QNNs)<n>AdaInit iteratively synthesizes initial parameters for QNNs that yield non-negligible gradient variance, thereby mitigating BPs.<n>We provide rigorous theoretical analyses of the submartingale-based process and empirically validate that AdaInit consistently outperforms existing methods in maintaining higher gradient variance across various QNN scales.
arXiv Detail & Related papers (2025-02-17T05:57:15Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters.
We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z) - Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states.
This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO)
We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z) - Optimizing Training Trajectories in Variational Autoencoders via Latent
Bayesian Optimization Approach [0.0]
Unsupervised and semi-supervised ML methods have become widely adopted across multiple areas of physics, chemistry, and materials sciences.
We propose a latent Bayesian optimization (zBO) approach for the hyper parameter trajectory optimization for the unsupervised and semi-supervised ML.
We demonstrate an application of this method for finding joint discrete and continuous rotationally invariant representations for MNIST and experimental data of a plasmonic nanoparticles material system.
arXiv Detail & Related papers (2022-06-30T23:41:47Z) - On the Existence of Optimal Transport Gradient for Learning Generative
Models [8.602553195689513]
Training of Wasserstein Generative Adversarial Networks (WGAN) relies on the calculation of the gradient of the optimal transport cost.
We first demonstrate that such gradient may not be defined, which can result in numerical instabilities during gradient-based optimization.
By exploiting the discrete nature of empirical data, we formulate the gradient in a semi-discrete setting and propose an algorithm for the optimization of the generative model parameters.
arXiv Detail & Related papers (2021-02-10T16:28:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.