Induced Model Matching: How Restricted Models Can Help Larger Ones
- URL: http://arxiv.org/abs/2402.12513v1
- Date: Mon, 19 Feb 2024 20:21:09 GMT
- Title: Induced Model Matching: How Restricted Models Can Help Larger Ones
- Authors: Usama Muneeb and Mesrob I. Ohannessian
- Abstract summary: We consider scenarios where a very accurate predictive model using restricted features is available at the time of training of a larger, full-featured, model.
How can the restricted model be useful to the full model?
We propose an approach for transferring the knowledge of the restricted model to the full model, by aligning the full model's context-restricted performance with that of the restricted model's.
- Score: 1.7676816383911753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider scenarios where a very accurate predictive model using restricted
features is available at the time of training of a larger, full-featured,
model. This restricted model may be thought of as "side-information", derived
either from an auxiliary exhaustive dataset or on the same dataset, by forcing
the restriction. How can the restricted model be useful to the full model? We
propose an approach for transferring the knowledge of the restricted model to
the full model, by aligning the full model's context-restricted performance
with that of the restricted model's. We call this methodology Induced Model
Matching (IMM) and first illustrate its general applicability by using logistic
regression as a toy example. We then explore IMM's use in language modeling,
the application that initially inspired it, and where it offers an explicit
foundation in contrast to the implicit use of restricted models in techniques
such as noising. We demonstrate the methodology on both LSTM and transformer
full models, using $N$-grams as restricted models. To further illustrate the
potential of the principle whenever it is much cheaper to collect restricted
rather than full information, we conclude with a simple RL example where POMDP
policies can improve learned MDP policies via IMM.
Related papers
- Offline Model-Based Reinforcement Learning with Anti-Exploration [0.0]
We present Morse Model-based offline RL (MoMo), which extends the anti-exploration paradigm found in offline model-free RL.
MoMo performs offline reinforcement learning using an anti-exploration bonus to counteract value overestimation.
The latter outperforms prior model-based and model-free baselines on the majority of D4RL datasets tested.
arXiv Detail & Related papers (2024-08-20T10:29:21Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - Trust the Model When It Is Confident: Masked Model-based Actor-Critic [11.675078067322897]
Masked Model-based Actor-Critic (M2AC) is a novel policy optimization algorithm.
M2AC implements a masking mechanism based on the model's uncertainty to decide whether its prediction should be used or not.
arXiv Detail & Related papers (2020-10-10T03:39:56Z) - Variational Model-based Policy Optimization [34.80171122943031]
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.
We propose an objective function as a variational lower-bound of a log-likelihood of a log-likelihood to jointly learn and improve model and policy.
Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called emactoral model-based policy optimization (VMBPO), is more sample-efficient and
arXiv Detail & Related papers (2020-06-09T18:30:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.