An Efficient Combinatorial Optimization Model Using Learning-to-Rank
Distillation
- URL: http://arxiv.org/abs/2201.00695v1
- Date: Fri, 24 Dec 2021 10:52:47 GMT
- Title: An Efficient Combinatorial Optimization Model Using Learning-to-Rank
Distillation
- Authors: Honguk Woo, Hyunsung Lee, Sangwoo Cho
- Abstract summary: We present the learning-to-rank distillation-based COP framework, where a high-performance ranking policy can be distilled into a non-iterative, simple model.
Specifically, we employ the approximated ranking distillation to render a score-based ranking model learnable via gradient descent.
We demonstrate that a distilled model achieves comparable performance to its respective, high-performance RL, but also provides several times faster inferences.
- Score: 2.0137632982900207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, deep reinforcement learning (RL) has proven its feasibility in
solving combinatorial optimization problems (COPs). The learning-to-rank
techniques have been studied in the field of information retrieval. While
several COPs can be formulated as the prioritization of input items, as is
common in the information retrieval, it has not been fully explored how the
learning-to-rank techniques can be incorporated into deep RL for COPs. In this
paper, we present the learning-to-rank distillation-based COP framework, where
a high-performance ranking policy obtained by RL for a COP can be distilled
into a non-iterative, simple model, thereby achieving a low-latency COP solver.
Specifically, we employ the approximated ranking distillation to render a
score-based ranking model learnable via gradient descent. Furthermore, we use
the efficient sequence sampling to improve the inference performance with a
limited delay. With the framework, we demonstrate that a distilled model not
only achieves comparable performance to its respective, high-performance RL,
but also provides several times faster inferences. We evaluate the framework
with several COPs such as priority-based task scheduling and multidimensional
knapsack, demonstrating the benefits of the framework in terms of inference
latency and performance.
Related papers
- Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval [44.61221990245263]
Learning to hash is a practical solution for efficient retrieval, offering fast search speed and low storage cost.
We explore the potential of enhancing the performance of learning to hash with the proliferation of powerful pre-trained models.
We introduce a novel method named Distillation for Cross-Modal Quantization (DCMQ) to improve hash representation learning.
arXiv Detail & Related papers (2024-05-23T15:54:59Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation [7.056222499095849]
beam search seeks the transcript with the greatest likelihood computed using the predicted distribution.
We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions.
We propose a decoding procedure that improves the performance of fine-tuned ASR models.
arXiv Detail & Related papers (2022-12-27T06:42:26Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - A Reinforcement Learning Environment For Job-Shop Scheduling [2.036811219647753]
This paper presents an efficient Deep Reinforcement Learning environment for Job-Shop Scheduling.
We design a meaningful and compact state representation as well as a novel, simple dense reward function.
We demonstrate that our approach significantly outperforms existing DRL methods on classic benchmark instances.
arXiv Detail & Related papers (2021-04-08T13:26:30Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce
Model [15.472533971305367]
In many real-world applications, the relative depth of objects in an image is crucial for scene understanding.
Recent approaches mainly tackle the problem of depth prediction in monocular images by treating the problem as a regression task.
Yet, ranking methods suggest themselves as a natural alternative to regression, and indeed, ranking approaches leveraging pairwise comparisons have shown promising performance on this problem.
arXiv Detail & Related papers (2020-10-25T13:40:10Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.