HOPE: Hierarchical Spatial-temporal Network for Occupancy Flow
Prediction
- URL: http://arxiv.org/abs/2206.10118v1
- Date: Tue, 21 Jun 2022 05:25:58 GMT
- Title: HOPE: Hierarchical Spatial-temporal Network for Occupancy Flow
Prediction
- Authors: Yihan Hu, Wenxin Shao, Bo Jiang, Jiajie Chen, Siqi Chai, Zhening Yang,
Jingyu Qian, Helong Zhou, Qiang Liu
- Abstract summary: We introduce our solution to the Occupancy and Flow Prediction challenge in the Open Challenges at CVPR 2022.
We have developed a novel hierarchical spatial-temporal network featured with spatial-temporal encoders, a multi-scale aggregator enriched with latent variables, and a hierarchical 3D decoder.
Our method achieves a Flow-Grounded Occupancy AUC of 0.8389 and outperforms all the other teams on the leaderboard.
- Score: 10.02342218798102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we introduce our solution to the Occupancy and Flow
Prediction challenge in the Waymo Open Dataset Challenges at CVPR 2022, which
ranks 1st on the leaderboard. We have developed a novel hierarchical
spatial-temporal network featured with spatial-temporal encoders, a multi-scale
aggregator enriched with latent variables, and a recursive hierarchical 3D
decoder. We use multiple losses including focal loss and modified flow trace
loss to efficiently guide the training process. Our method achieves a
Flow-Grounded Occupancy AUC of 0.8389 and outperforms all the other teams on
the leaderboard.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Neural Eulerian Scene Flow Fields [59.57980592109722]
EulerFlow works out-of-the-box without tuning across multiple domains.
It exhibits emergent 3D point tracking behavior by solving its estimated ODE over long-time horizons.
It outperforms all prior art on the Argoverse 2 2024 Scene Flow Challenge.
arXiv Detail & Related papers (2024-10-02T20:56:45Z) - A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera [0.8576354642891824]
Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical.
To interface with such data and leverage their rich temporal temporal, we propose a causal convolutional network.
We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.
arXiv Detail & Related papers (2024-04-13T00:13:20Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - Active search and coverage using point-cloud reinforcement learning [50.741409008225766]
This paper presents an end-to-end deep reinforcement learning solution for target search and coverage.
We show that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points.
We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome.
arXiv Detail & Related papers (2023-12-18T18:16:30Z) - Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd
Flow Inference [23.8192952068949]
We present a novel Contrastive Self-learning framework for S-temporal data (CSST)
Our approach initiates with the construction of a spatial adjacency graph founded on the Points of Interest (POIs) and their respective distances.
We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances.
Our experiments, conducted on two real-world datasets, demonstrate that the CSST pre-trained on extensive noisy data consistently outperforms models trained from scratch.
arXiv Detail & Related papers (2023-09-06T02:51:24Z) - Long-Short Temporal Co-Teaching for Weakly Supervised Video Anomaly
Detection [14.721615285883423]
Weakly supervised anomaly detection (WS-VAD) is a challenging problem that aims to learn VAD models only with video-level annotations.
Our proposed method is able to better deal with anomalies with varying durations as well as subtle anomalies.
arXiv Detail & Related papers (2023-03-31T13:28:06Z) - Pyramid Correlation based Deep Hough Voting for Visual Object Tracking [16.080776515556686]
We introduce a voting-based classification-only tracking algorithm named Pyramid Correlation based Deep Hough Voting (short for PCDHV)
Specifically we innovatively construct a Pyramid Correlation module to equip the embedded feature with fine-grained local structures and global spatial contexts.
The elaborately designed Deep Hough Voting module further take over, integrating long-range dependencies of pixels to perceive corners.
arXiv Detail & Related papers (2021-10-15T10:37:00Z) - Hierarchical Attention Learning of Scene Flow in 3D Point Clouds [28.59260783047209]
This paper studies the problem of scene flow estimation from two consecutive 3D point clouds.
A novel hierarchical neural network with double attention is proposed for learning the correlation of point features in adjacent frames.
Experiments show that the proposed network outperforms the state-of-the-art performance of 3D scene flow estimation.
arXiv Detail & Related papers (2020-10-12T14:56:08Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z) - Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale.
Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous.
We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.