Related papers: TimeRL: Efficient Deep Reinforcement Learning with Polyhedral Dependence Graphs

TimeRL: Efficient Deep Reinforcement Learning with Polyhedral Dependence Graphs

URL: http://arxiv.org/abs/2501.05408v1
Date: Thu, 09 Jan 2025 18:05:33 GMT
Title: TimeRL: Efficient Deep Reinforcement Learning with Polyhedral Dependence Graphs
Authors: Pedro F. Silvestre, Peter Pietzuch,
Abstract summary: TimeRL is a system for executing dynamic DRL programs that combines the dynamism of eager execution with the whole-program optimizations and scheduling of graph-based execution.<n>We show that TimeRL executes current DRL algorithms up to 47$times$ faster than existing DRL systems, while using 16$times$ less GPU peak memory.
Score: 0.552480439325792
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern deep learning (DL) workloads increasingly use complex deep reinforcement learning (DRL) algorithms that generate training data within the learning loop. This results in programs with several nested loops and dynamic data dependencies between tensors. While DL systems with eager execution support such dynamism, they lack the optimizations and smart scheduling of graph-based execution. Graph-based execution, however, cannot express dynamic tensor shapes, instead requiring the use of multiple static subgraphs. Either execution model for DRL thus leads to redundant computation, reduced parallelism, and less efficient memory management. We describe TimeRL, a system for executing dynamic DRL programs that combines the dynamism of eager execution with the whole-program optimizations and scheduling of graph-based execution. TimeRL achieves this by introducing the declarative programming model of recurrent tensors, which allows users to define dynamic dependencies as intuitive recurrence equations. TimeRL translates recurrent tensors into a polyhedral dependence graph (PDG) with dynamic dependencies as symbolic expressions. Through simple PDG transformations, TimeRL applies whole-program optimizations, such as automatic vectorization, incrementalization, and operator fusion. The PDG also allows for the computation of an efficient program-wide execution schedule, which decides on buffer deallocations, buffer donations, and GPU/CPU memory swapping. We show that TimeRL executes current DRL algorithms up to 47$\times$ faster than existing DRL systems, while using 16$\times$ less GPU peak memory.

Related papers

Streaming Tensor Program: A streaming abstraction for dynamic parallelism [3.2194902146668127]
Streaming Program (STeP) is a new streaming abstraction that enables dynamic tensor workloads to run efficiently on spatial dataflow accelerators.<n> STeP introduces flexible routing operators, an explicit memory hierarchy, and symbolic shape semantics that expose dynamic data rates and tensor dimensions.<n>These capabilities unlock new optimizations-dynamic tiling, dynamic parallelization, and configuration time-multiplexing-that adapt to dynamic behaviors while preserving dataflow efficiency.
arXiv Detail & Related papers (2025-11-11T02:49:10Z)
T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs [6.199165061105655]
We introduce the Temporal Graph Reasoning Benchmark (T-GRAB) to systematically probe the capabilities of TGNNs to reason across time.<n>T-GRAB provides controlled, interpretable tasks that isolate key temporal skills.<n>We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns.
arXiv Detail & Related papers (2025-07-14T11:47:43Z)
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization [59.96455188197593]
Large language models (LLMs) deliver impressive results but face challenges from increasing model sizes and computational costs.<n>We propose DRPruning, a method that dynamically adjusts the data distribution during training to restore balanced performance across heterogeneous and multi-tasking data.<n> Experiments in monolingual and multilingual settings show that DRPruning surpasses similarly sized models in both pruning and continued pretraining over perplexity, downstream tasks, and instruction tuning.
arXiv Detail & Related papers (2024-11-21T12:02:39Z)
Supra-Laplacian Encoding for Transformer on Dynamic Graphs [14.293220696079919]
We present a new-temporal encoding for GT architecture while keeping temporal information. Specifically, we transform Time Dynamic Graphplas into multi-layer graphs and take advantage of the spectral properties of their associated supra-lacian matrix. Our second contribution explicitly model nodes pairwise with a cross-attention mechanism, providing an accurate edge representation for dynamic link prediction.
arXiv Detail & Related papers (2024-09-26T15:56:40Z)
TimeGraphs: Graph-based Temporal Reasoning [64.18083371645956]
TimeGraphs is a novel approach that characterizes dynamic interactions as a hierarchical temporal graph. Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales. We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset.
arXiv Detail & Related papers (2024-01-06T06:26:49Z)
Automatic Task Parallelization of Dataflow Graphs in ML/DL models [0.0]
We present a Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs. We generate readable and executable parallel Pytorch+Python code from input ML models in ONNX format. Preliminary results on several ML graphs demonstrate up to 1.9$times$ speedup over serial execution.
arXiv Detail & Related papers (2023-08-22T04:54:30Z)
Deep Temporal Graph Clustering [77.02070768950145]
We propose a general framework for deep Temporal Graph Clustering (GC) GC introduces deep clustering techniques to suit the interaction sequence-based batch-processing pattern of temporal graphs. Our framework can effectively improve the performance of existing temporal graph learning methods.
arXiv Detail & Related papers (2023-05-18T06:17:50Z)
RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs [12.952987240366781]
This work presents a reinforcement learning (RL) based scheduling framework, which learns the behaviors of optimal optimization algorithms. RL generates near-optimal scheduling results with short solving runtime overhead. Our framework has demonstrated up to $sim2.5times$ real-world on-chip runtime inference speedups over the commercial compiler.
arXiv Detail & Related papers (2023-04-10T17:22:12Z)
Self-Supervised Temporal Graph learning with Temporal and Structural Intensity Alignment [53.72873672076391]
Temporal graph learning aims to generate high-quality representations for graph-based tasks with dynamic information. We propose a self-supervised method called S2T for temporal graph learning, which extracts both temporal and structural information. S2T achieves at most 10.13% performance improvement compared with the state-of-the-art competitors on several datasets.
arXiv Detail & Related papers (2023-02-15T06:36:04Z)
PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs [3.3019914257038168]
Dynamic Graph Neural Networks (DGNNs) have been broadly applied in various real-life applications, such as link prediction and pandemic forecast. DGNNs manifest substantial parallel computation and data reuse potentials, but suffer from severe memory access inefficiency and data transfer overhead. We propose PiPAD, a $underlinetextbfPipelined$ and $underlinetextbfDGNN$ training framework for the end-to-end performance optimization on GPUs.
arXiv Detail & Related papers (2023-01-01T12:10:31Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
MSRL: Distributed Reinforcement Learning with Dataflow Fragments [16.867322708270116]
Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training is parallelised and distributed on cluster resources. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps functions from an RL algorithm's training loop to parallel computational fragments.
arXiv Detail & Related papers (2022-10-03T12:34:58Z)
Time-aware Dynamic Graph Embedding for Asynchronous Structural Evolution [60.695162101159134]
Existing works merely view a dynamic graph as a sequence of changes. We formulate dynamic graphs as temporal edge sequences associated with joining time of. vertex and timespan of edges. A time-aware Transformer is proposed to embed. vertex' dynamic connections and ToEs into the learned. vertex representations.
arXiv Detail & Related papers (2022-07-01T15:32:56Z)
NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems. This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS) We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z)
Efficient Dynamic Graph Representation Learning at Scale [66.62859857734104]
We propose Efficient Dynamic Graph lEarning (EDGE), which selectively expresses certain temporal dependency via training loss to improve the parallelism in computations. We show that EDGE can scale to dynamic graphs with millions of nodes and hundreds of millions of temporal events and achieve new state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2021-12-14T22:24:53Z)
Dynamic Graph Learning-Neural Network for Multivariate Time Series Modeling [2.3022070933226217]
We propose a novel framework, namely static- and dynamic-graph learning-neural network (GL) The model acquires static and dynamic graph matrices from data to model long-term and short-term patterns respectively. It achieves state-of-the-art performance on almost all datasets.
arXiv Detail & Related papers (2021-12-06T08:19:15Z)
High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z)
Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs. It is still challenging to perform training in an efficient and scalable way. We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z)
Time-varying Graph Representation Learning via Higher-Order Skip-Gram with Negative Sampling [0.456877715768796]
We build upon the fact that the skip-gram embedding approach implicitly performs a matrix factorization. We show that higher-order skip-gram with negative sampling is able to disentangle the role of nodes and time. We empirically evaluate our approach using time-resolved face-to-face proximity data, showing that the learned time-varying graph representations outperform state-of-the-art methods.
arXiv Detail & Related papers (2020-06-25T12:04:48Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
L$^2$-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks [118.37805042816784]
Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. We propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size.
arXiv Detail & Related papers (2020-03-30T16:37:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.