Related papers: Residual Pathway Priors for Soft Equivariance Constraints

Residual Pathway Priors for Soft Equivariance Constraints

URL: http://arxiv.org/abs/2112.01388v1
Date: Thu, 2 Dec 2021 16:18:17 GMT
Title: Residual Pathway Priors for Soft Equivariance Constraints
Authors: Marc Finzi, Gregory Benton, Andrew Gordon Wilson
Abstract summary: We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors. RPPs are resilient to approximate or misspecified symmetries, and are as effective as fully constrained models even when symmetries are exact.
Score: 44.19582621065543
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning. We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors, guiding models towards structured solutions, while retaining the ability to capture additional complexity. Using RPPs, we construct neural network priors with inductive biases for equivariances, but without limiting flexibility. We show that RPPs are resilient to approximate or misspecified symmetries, and are as effective as fully constrained models even when symmetries are exact. We showcase the broad applicability of RPPs with dynamical systems, tabular data, and reinforcement learning. In Mujoco locomotion tasks, where contact forces and directional rewards violate strict equivariance assumptions, the RPP outperforms baseline model-free RL agents, and also improves the learned transition models for model-based RL.

Related papers

Paving the way for scientific foundation models: enhancing generalization and robustness in PDEs with constraint-aware pre-training [49.8035317670223]
A scientific foundation model (SciFM) is emerging as a promising tool for learning transferable representations across diverse domains. We propose incorporating PDE residuals into pre-training either as the sole learning signal or in combination with data loss to compensate for limited or infeasible training data. Our results show that pre-training with PDE constraints significantly enhances generalization, outperforming models trained solely on solution data.
arXiv Detail & Related papers (2025-03-24T19:12:39Z)
Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence [11.400431211239958]
Diffusion models have emerged as powerful tools for generative modeling. We propose a control framework for fine-tuning diffusion models. We show that PI-FT achieves global convergence at a linear rate.
arXiv Detail & Related papers (2024-12-24T04:55:46Z)
Meta-Learning Adaptable Foundation Models [37.458141335750696]
We introduce a meta-learning framework infused with PEFT in this intermediate retraining stage to learn a model that can be easily adapted to unseen tasks. In this setting, we demonstrate the suboptimality of standard retraining for finding an adaptable set of parameters. We then apply these theoretical insights to retraining the RoBERTa model to predict the continuation of conversations within the ConvAI2 dataset.
arXiv Detail & Related papers (2024-10-29T17:24:18Z)
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models. We also propose a simple pre-training technique that leads to fully or partially shared models. Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Learning a model is paramount for sample efficiency in reinforcement learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z)
Learning Optimal Features via Partial Invariance [18.552839725370383]
Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments. We show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $textitpartial invariance$. Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.
arXiv Detail & Related papers (2023-01-28T02:48:14Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Revisiting Design Choices in Model-Based Offline Reinforcement Learning [39.01805509055988]
Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies. This paper compares and designs novel protocols to investigate their interaction with other hyper parameters, such as the number of models, or imaginary rollout horizon.
arXiv Detail & Related papers (2021-10-08T13:51:34Z)
Re-parameterizing VAEs for stability [1.90365714903665]
We propose a theoretical approach towards the training numerical stability of Variational AutoEncoders (VAE) Our work is motivated by recent studies empowering VAEs to reach state of the art generative results on complex image datasets. We show that by implementing small changes to the way we parameterize the Normal distributions on which they rely, VAEs can securely be trained.
arXiv Detail & Related papers (2021-06-25T16:19:09Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity. In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.