HybridNets: End-to-End Perception Network
- URL: http://arxiv.org/abs/2203.09035v1
- Date: Thu, 17 Mar 2022 02:29:12 GMT
- Title: HybridNets: End-to-End Perception Network
- Authors: Dat Vu, Bao Ngo and Hung Phan
- Abstract summary: This paper systematically studies an end-to-end perception network for multi-tasking.
We have developed an end-to-end perception network to perform multi-tasking, including traffic object detection, drivable area segmentation and lane detection simultaneously, called HybridNets.
- Score: 1.4287758028119788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end Network has become increasingly important in multi-tasking. One
prominent example of this is the growing significance of a driving perception
system in autonomous driving. This paper systematically studies an end-to-end
perception network for multi-tasking and proposes several key optimizations to
improve accuracy. First, the paper proposes efficient segmentation head and
box/class prediction networks based on weighted bidirectional feature network.
Second, the paper proposes automatically customized anchor for each level in
the weighted bidirectional feature network. Third, the paper proposes an
efficient training loss function and training strategy to balance and optimize
network. Based on these optimizations, we have developed an end-to-end
perception network to perform multi-tasking, including traffic object
detection, drivable area segmentation and lane detection simultaneously, called
HybridNets, which achieves better accuracy than prior art. In particular,
HybridNets achieves 77.3 mean Average Precision on Berkeley DeepDrive Dataset,
outperforms lane detection with 31.6 mean Intersection Over Union with 12.83
million parameters and 15.6 billion floating-point operations. In addition, it
can perform visual perception tasks in real-time and thus is a practical and
accurate solution to the multi-tasking problem. Code is available at
https://github.com/datvuthanh/HybridNets.
Related papers
- CAR-BRAINet: Sub-6GHz Aided Spatial Adaptive Beam Prediction with Multi Head Attention for Heterogeneous Vehicular Networks [4.84929109771831]
Heterogeneous Vehicular Networks (HetVNets) play a key role by stacking different communication technologies such as sub-6GHz, mm-wave and DSRC to meet diverse connectivity needs of 5G/B5G vehicular networks.<n>HetVNet helps address the humongous user demands-but maintaining a steady connection in a highly mobile, real-world conditions remain a challenge.<n>This paper introduces a lightweight deep learning-based solution termed-"CAR-BRAINet" which consists of convolutional neural networks with a powerful multi-head attention mechanism.
arXiv Detail & Related papers (2025-09-02T05:17:23Z) - A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - Active search and coverage using point-cloud reinforcement learning [50.741409008225766]
This paper presents an end-to-end deep reinforcement learning solution for target search and coverage.
We show that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points.
We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome.
arXiv Detail & Related papers (2023-12-18T18:16:30Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - HeteroEdge: Addressing Asymmetry in Heterogeneous Collaborative
Autonomous Systems [1.274065448486689]
We propose a self-adaptive optimization framework for a testbed comprising two Unmanned Ground Vehicles (UGVs) and two NVIDIA Jetson devices.
This framework efficiently manages multiple tasks (storage, processing, computation, transmission, inference) on heterogeneous nodes concurrently.
It involves compressing and masking input image frames, identifying similar frames, and profiling devices to obtain boundary conditions for optimization.
arXiv Detail & Related papers (2023-05-05T02:43:16Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - High Efficiency Pedestrian Crossing Prediction [0.0]
State-of-the-art methods in predicting pedestrian crossing intention often rely on multiple streams of information as inputs.
We introduce a network with only frames of pedestrians as the input.
Experiments validate that our model consistently delivers outstanding performances.
arXiv Detail & Related papers (2022-04-04T21:37:57Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Fully Dynamic Inference with Deep Neural Networks [19.833242253397206]
Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped.
On the CIFAR-10 dataset, LC-Net results in up to 11.9$times$ fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods.
On the ImageNet dataset, LC-Net achieves up to 1.4$times$ fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
arXiv Detail & Related papers (2020-07-29T23:17:48Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.