Pattern Transfer Learning for Reinforcement Learning in Order
Dispatching
- URL: http://arxiv.org/abs/2105.13218v1
- Date: Thu, 27 May 2021 15:08:34 GMT
- Title: Pattern Transfer Learning for Reinforcement Learning in Order
Dispatching
- Authors: Runzhe Wan, Sheng Zhang, Chengchun Shi, Shikai Luo and Rui Song
- Abstract summary: We propose a pattern transfer learning framework for value-based reinforcement learning in the order dispatch problem.
The superior performance of the proposed method is supported by experiments.
- Score: 12.747361275395011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Order dispatch is one of the central problems to ride-sharing platforms.
Recently, value-based reinforcement learning algorithms have shown promising
performance on this problem. However, in real-world applications, the
non-stationarity of the demand-supply system poses challenges to re-utilizing
data generated in different time periods to learn the value function. In this
work, motivated by the fact that the relative relationship between the values
of some states is largely stable across various environments, we propose a
pattern transfer learning framework for value-based reinforcement learning in
the order dispatch problem. Our method efficiently captures the value patterns
by incorporating a concordance penalty. The superior performance of the
proposed method is supported by experiments.
Related papers
- ReconBoost: Boosting Can Achieve Modality Reconcilement [89.4377895465204]
We study the modality-alternating learning paradigm to achieve reconcilement.
We propose a new method called ReconBoost to update a fixed modality each time.
We show that the proposed method resembles Friedman's Gradient-Boosting (GB) algorithm, where the updated learner can correct errors made by others.
arXiv Detail & Related papers (2024-05-15T13:22:39Z) - Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and Privacy [11.308544280789016]
We propose feature-based federated transfer learning as a novel approach to improve communication efficiency.
Specifically, in the proposed feature-based federated learning, we design the extracted features and outputs to be uploaded instead of parameter updates.
We evaluate the performance of the proposed learning scheme via experiments on an image classification task and a natural language processing task to demonstrate its effectiveness.
arXiv Detail & Related papers (2024-05-15T00:43:19Z) - Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation.
This reliance results in data inefficiency as maintaining a state-action-value function in high-dimensional action spaces is challenging.
We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z) - UNIDEAL: Curriculum Knowledge Distillation Federated Learning [17.817181326740698]
Federated Learning (FL) has emerged as a promising approach to enable collaborative learning among multiple clients.
In this paper, we present UNI, a novel FL algorithm specifically designed to tackle the challenges of cross-domain scenarios.
Our results demonstrate that UNI achieves superior performance in terms of both model accuracy and communication efficiency.
arXiv Detail & Related papers (2023-09-16T11:30:29Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Robust Deep Reinforcement Learning Scheduling via Weight Anchoring [7.570246812206769]
We use weight anchoring to cultivate and fixate desired behavior in Neural Networks.
Weight anchoring may be used to find a solution to a learning problem that is nearby the solution of another learning problem.
Results show that this method provides performance comparable to the state of the art of augmenting a simulation environment.
arXiv Detail & Related papers (2023-04-20T09:30:23Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.