Exploring Domain Robust Lightweight Reward Models based on Router Mechanism
- URL: http://arxiv.org/abs/2407.17546v1
- Date: Wed, 24 Jul 2024 17:25:12 GMT
- Title: Exploring Domain Robust Lightweight Reward Models based on Router Mechanism
- Authors: Hyuk Namgoong, Jeesu Jung, Sangkeun Jung, Yoonhyung Roh,
- Abstract summary: We explore the utilization of small language models operating in a domain-specific manner based on router mechanisms.
Our three approaches are: 1) utilize mixture of experts to form a single reward model by modularizing an internal router and experts, 2) employing external router to select the appropriate reward model from multiple domain-specific models, and 3) the framework reduces parameter size by loading reward models and router adapters onto a single small language model using adapters.
- Score: 1.3624495460189863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in large language models have heavily relied on the large reward model from reinforcement learning from human feedback for fine-tuning. However, the use of a single reward model across various domains may not always be optimal, often requiring retraining from scratch when new domain data is introduced. To address these challenges, we explore the utilization of small language models operating in a domain-specific manner based on router mechanisms. Our three approaches are: 1) utilize mixture of experts to form a single reward model by modularizing an internal router and experts, 2) employing external router to select the appropriate reward model from multiple domain-specific models, and 3) the framework reduces parameter size by loading reward models and router adapters onto a single small language model using adapters. Experimental validation underscores the effectiveness of our approach, demonstrating performance comparable to baseline methods while also reducing the total parameter size.
Related papers
- RedTest: Towards Measuring Redundancy in Deep Neural Networks Effectively [10.812755570974929]
We use Model Structural Redundancy Score (MSRS) to measure the degree of redundancy in a deep learning model structure.
MSRS is effective in both revealing and assessing the redundancy issues in many state-of-the-art models.
We design a novel redundancy-aware algorithm to guide the search for the optimal model structure.
arXiv Detail & Related papers (2024-11-15T14:36:07Z) - MoDEM: Mixture of Domain Expert Models [23.846823652305027]
We propose a novel approach to enhancing the performance and efficiency of large language models (LLMs)
We introduce a system that utilizes a BERT-based router to direct incoming prompts to the most appropriate domain expert model.
Our research demonstrates that this approach can significantly outperform general-purpose models of comparable size.
arXiv Detail & Related papers (2024-10-09T23:52:54Z) - RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models [58.987116118425995]
We introduce RouterRetriever, a retrieval model that leverages multiple domain-specific experts.
It is lightweight and allows easy addition or removal of experts without additional training.
It is the first work to demonstrate the advantages of using multiple domain-specific expert embedding models.
arXiv Detail & Related papers (2024-09-04T13:16:55Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models.
The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety.
On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z) - SepRep-Net: Multi-source Free Domain Adaptation via Model Separation And Reparameterization [75.74369886582394]
We propose a novel framework called SepRep-Net to tackle multi-source free domain adaptation.
SepRep-Net reassembled multiple existing models to a unified network, while maintaining separate pathways (Separation)
SepRep-Net is characterized by 1) effectiveness: competitive performance on the target domain, 2) efficiency: low computational costs, and 3) generalizability: maintaining more source knowledge than existing solutions.
arXiv Detail & Related papers (2024-02-13T06:35:00Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Mixture Manifold Networks: A Computationally Efficient Baseline for
Inverse Modeling [7.891408798179181]
We propose and show the efficacy of a new method to address generic inverse problems.
Recent work has shown impressive results using deep learning, but we note that there is a trade-off between model performance and computational time.
arXiv Detail & Related papers (2022-11-25T20:18:07Z) - Multi-path Neural Networks for On-device Multi-domain Visual
Classification [55.281139434736254]
This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices.
The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space.
The determined multi-path model selectively shares parameters across domains in shared nodes while keeping domain-specific parameters within non-shared nodes in individual domain paths.
arXiv Detail & Related papers (2020-10-10T05:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.