Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving
Perception
- URL: http://arxiv.org/abs/2307.04537v1
- Date: Mon, 10 Jul 2023 13:02:46 GMT
- Title: Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving
Perception
- Authors: Chi-Chih Chang, Wei-Cheng Lin, Pei-Shuo Wang, Sheng-Feng Yu, Yu-Chen
Lu, Kuan-Cheng Lin and Kai-Chiang Wu
- Abstract summary: We present an efficient and quantization-aware panoptic driving perception model (Q- YOLOP) for object detection, drivable area segmentation, and lane line segmentation.
The proposed model achieves state-of-the-art performance with an mAP@0.5 of 0.622 for object detection and an mIoU of 0.612 for segmentation.
- Score: 6.3709120604927945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present an efficient and quantization-aware panoptic driving
perception model (Q- YOLOP) for object detection, drivable area segmentation,
and lane line segmentation, in the context of autonomous driving. Our model
employs the Efficient Layer Aggregation Network (ELAN) as its backbone and
task-specific heads for each task. We employ a four-stage training process that
includes pretraining on the BDD100K dataset, finetuning on both the BDD100K and
iVS datasets, and quantization-aware training (QAT) on BDD100K. During the
training process, we use powerful data augmentation techniques, such as random
perspective and mosaic, and train the model on a combination of the BDD100K and
iVS datasets. Both strategies enhance the model's generalization capabilities.
The proposed model achieves state-of-the-art performance with an mAP@0.5 of
0.622 for object detection and an mIoU of 0.612 for segmentation, while
maintaining low computational and memory requirements.
Related papers
- Efficiency for Free: Ideal Data Are Transportable Representations [12.358393766570732]
We investigate the efficiency properties of data from both optimization and generalization perspectives.
We propose the Representation Learning Accelerator (algopt), which promotes the formation and utilization of efficient data.
arXiv Detail & Related papers (2024-05-23T15:06:02Z) - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance.
Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv Detail & Related papers (2024-04-04T17:58:02Z) - Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes [7.765333471208582]
In Visual Inductive Priors challenge (VIPriors2023), participants must train a model capable of precisely locating individuals on a basketball court.
We propose memory effIciency inStance framework based on visual inductive prior flow propagation.
Experiments demonstrate our model promising performance even under limited data and memory constraints.
arXiv Detail & Related papers (2024-03-18T08:44:40Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - SeiT++: Masked Token Modeling Improves Storage-efficient Training [36.95646819348317]
Recent advancements in Deep Neural Network (DNN) models have significantly improved performance across computer vision tasks.
achieving highly generalizable and high-performing vision models requires expansive datasets, resulting in significant storage requirements.
Recent breakthrough by SeiT proposed the use of Vector-Quantized (VQ) feature vectors (i.e., tokens) as network inputs for vision classification.
In this paper, we extend SeiT by integrating Masked Token Modeling (MTM) for self-supervised pre-training.
arXiv Detail & Related papers (2023-12-15T04:11:34Z) - Developing a Resource-Constraint EdgeAI model for Surface Defect
Detection [1.338174941551702]
We propose a lightweight EdgeAI architecture modified from Xception for on-device training in a resource-constraint edge environment.
We evaluate our model on a PCB defect detection task and compare its performance against existing lightweight models.
Our method can be applied to other resource-constraint applications while maintaining significant performance.
arXiv Detail & Related papers (2023-12-04T15:28:31Z) - You Only Look at Once for Real-time and Generic Multi-Task [20.61477620156465]
A-YOLOM is an adaptive, real-time, and lightweight multi-task model.
We develop an end-to-end multi-task model with a unified and streamlined segmentation structure.
We achieve competitive results on the BDD100k dataset.
arXiv Detail & Related papers (2023-10-02T21:09:43Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Delving Deeper into Data Scaling in Masked Image Modeling [145.36501330782357]
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
Specifically, we utilize the web-collected Coyo-700M dataset.
Our goal is to investigate how the performance changes on downstream tasks when scaling with different sizes of data and models.
arXiv Detail & Related papers (2023-05-24T15:33:46Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.