DRILL-- Deep Reinforcement Learning for Refinement Operators in
$\mathcal{ALC}$
- URL: http://arxiv.org/abs/2106.15373v1
- Date: Tue, 29 Jun 2021 12:57:45 GMT
- Title: DRILL-- Deep Reinforcement Learning for Refinement Operators in
$\mathcal{ALC}$
- Authors: Caglar Demir and Axel-Cyrille Ngonga Ngomo
- Abstract summary: We propose DRILL -- a novel class expression learning approach that uses a convolutional deep Q-learning model to steer its search.
By virtue of its architecture, DRILL is able to compute the expected discounted cumulated future reward of more than $103$ class expressions in a second on standard hardware.
- Score: 1.9036571490366496
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Approaches based on refinement operators have been successfully applied to
class expression learning on RDF knowledge graphs. These approaches often need
to explore a large number of concepts to find adequate hypotheses. This need
arguably stems from current approaches relying on myopic heuristic functions to
guide their search through an infinite concept space. In turn, deep
reinforcement learning provides effective means to address myopia by estimating
how much discounted cumulated future reward states promise. In this work, we
leverage deep reinforcement learning to accelerate the learning of concepts in
$\mathcal{ALC}$ by proposing DRILL -- a novel class expression learning
approach that uses a convolutional deep Q-learning model to steer its search.
By virtue of its architecture, DRILL is able to compute the expected discounted
cumulated future reward of more than $10^3$ class expressions in a second on
standard hardware. We evaluate DRILL on four benchmark datasets against
state-of-the-art approaches. Our results suggest that DRILL converges to goal
states at least 2.7$\times$ faster than state-of-the-art models on all
benchmark datasets. We provide an open-source implementation of our approach,
including training and evaluation scripts as well as pre-trained models.
Related papers
- Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm [0.195804735329484]
Reinforcement learning (RL) and Deep Reinforcement Learning (DRL) have the potential to disrupt and are already changing the way we interact with the world.
One of the key indicators of their applicability is their ability to scale and work in real-world scenarios.
arXiv Detail & Related papers (2024-08-19T14:50:48Z) - Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors [58.661454334877256]
Drug-Target binding Affinity (DTA) prediction is essential for drug discovery.
Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal.
We propose $k$NN-DTA, a non-representation embedding-based retrieval method adopted on a pre-trained DTA prediction model.
arXiv Detail & Related papers (2024-07-21T15:49:05Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models.
The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety.
On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - A Generalized Bootstrap Target for Value-Learning, Efficiently Combining
Value and Feature Predictions [39.17511693008055]
Estimating value functions is a core component of reinforcement learning algorithms.
We focus on bootstrapping targets used when estimating value functions.
We propose a new backup target, the $eta$-return mixture.
arXiv Detail & Related papers (2022-01-05T21:54:55Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z) - Adaptive Approximate Policy Iteration [22.915651391812187]
We present a learning scheme which enjoys a $tildeO(T2/3)$ regret bound for undiscounted, continuing learning in uniformly ergodic MDPs.
This is an improvement over the best existing bound of $tildeO(T3/4)$ for the average-reward case with function approximation.
arXiv Detail & Related papers (2020-02-08T02:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.