Related papers: Reinforcement Learning with Latent Flow

Reinforcement Learning with Latent Flow

URL: http://arxiv.org/abs/2101.01857v1
Date: Wed, 6 Jan 2021 03:50:50 GMT
Title: Reinforcement Learning with Latent Flow
Authors: Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin
Abstract summary: Flow of Latents for Reinforcement Learning (Flare) is a network architecture for RL that explicitly encodes temporal information through latent vector differences. We show that Flare recovers optimal performance in state-based RL without explicit access to the state velocity. We also show that Flare achieves state-of-the-art performance on pixel-based challenging continuous control tasks within the DeepMind control benchmark suite.
Score: 78.74671595139613
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Temporal information is essential to learning effective policies with Reinforcement Learning (RL). However, current state-of-the-art RL algorithms either assume that such information is given as part of the state space or, when learning from pixels, use the simple heuristic of frame-stacking to implicitly capture temporal information present in the image observations. This heuristic is in contrast to the current paradigm in video classification architectures, which utilize explicit encodings of temporal information through methods such as optical flow and two-stream architectures to achieve state-of-the-art performance. Inspired by leading video classification architectures, we introduce the Flow of Latents for Reinforcement Learning (Flare), a network architecture for RL that explicitly encodes temporal information through latent vector differences. We show that Flare (i) recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information, (ii) achieves state-of-the-art performance on pixel-based challenging continuous control tasks within the DeepMind control benchmark suite, namely quadruped walk, hopper hop, finger turn hard, pendulum swing, and walker run, and is the most sample efficient model-free pixel-based RL algorithm, outperforming the prior model-free state-of-the-art by 1.9X and 1.5X on the 500k and 1M step benchmarks, respectively, and (iv), when augmented over rainbow DQN, outperforms this state-of-the-art level baseline on 5 of 8 challenging Atari games at 100M time step benchmark.

Related papers

SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs. We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z)
STLight: a Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal joint Processing [6.872340834265972]
We propose STLight, a novel method for S-temporal learning that relies solely on channel-wise and depth-wise convolutions as learnable layers. STLight overcomes the limitations of traditional convolutional approaches by rearranging spatial and temporal dimensions together. Our architecture achieves state-of-the-art performance on STL benchmarks across datasets and settings, while significantly improving computational efficiency in terms of parameters and computational FLOPs.
arXiv Detail & Related papers (2024-11-15T13:53:19Z)
Highway Graph to Accelerate Reinforcement Learning [18.849312069946993]
We propose a novel graph structure, named highway graph, to model the state transition. By integrating the highway graph into RL, the RL training can be remarkably accelerated in the early stages. Deep neural network-based agent is trained using the highway graph, resulting in better generalization and lower storage costs.
arXiv Detail & Related papers (2024-05-20T02:09:07Z)
Hierarchical Graph Pattern Understanding for Zero-Shot VOS [102.21052200245457]
This paper proposes a new hierarchical graph neural network (GNN) architecture for zero-shot video object segmentation (ZS-VOS) Inspired by the strong ability of GNNs in capturing structural relations, HGPU innovatively leverages motion cues (ie, optical flow) to enhance the high-order representations from the neighbors of target frames.
arXiv Detail & Related papers (2023-12-15T04:13:21Z)
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos. The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters. Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z)
Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method [51.30748775681917]
We consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution. We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms. As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method.
arXiv Detail & Related papers (2022-12-22T09:05:07Z)
Motion-aware Memory Network for Fast Video Salient Object Detection [15.967509480432266]
We design a space-time memory (STM)-based network, which extracts useful temporal information of the current frame from adjacent frames as the temporal branch of VSOD. In the encoding stage, we generate high-level temporal features by using high-level features from the current and its adjacent frames. In the decoding stage, we propose an effective fusion strategy for spatial and temporal branches. The proposed model does not require optical flow or other preprocessing, and can reach a speed of nearly 100 FPS during inference.
arXiv Detail & Related papers (2022-08-01T15:56:19Z)
KORSAL: Key-point Detection based Online Real-Time Spatio-Temporal Action Localization [0.9507070656654633]
Real-time and online action localization in a video is a critical yet highly challenging problem. Recent attempts achieve this by using computationally intensive 3D CNN architectures or highly redundant two-stream architectures with optical flow. We propose utilizing fast and efficient key-point based bounding box prediction to spatially localize actions. Our model achieves a frame rate of 41.8 FPS, which is a 10.7% improvement over contemporary real-time methods.
arXiv Detail & Related papers (2021-11-05T08:39:36Z)
Sequential Place Learning: Heuristic-Free High-Performance Long-Term Place Recognition [24.70946979449572]
We develop a learning-based CNN+LSTM architecture, trainable via backpropagation through time, for viewpoint- and appearance-invariant place recognition. Our model outperforms 15 classical methods while setting new state-of-the-art performance standards. In addition, we show that SPL can be up to 70x faster to deploy than classical methods on a 729 km route.
arXiv Detail & Related papers (2021-03-02T22:57:43Z)
A Plug-and-play Scheme to Adapt Image Saliency Deep Model for Video Data [54.198279280967185]
This paper proposes a novel plug-and-play scheme to weakly retrain a pretrained image saliency deep model for video data. Our method is simple yet effective for adapting any off-the-shelf pre-trained image saliency deep model to obtain high-quality video saliency detection.
arXiv Detail & Related papers (2020-08-02T13:23:14Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.