Scene Understanding for Autonomous Driving
- URL: http://arxiv.org/abs/2105.04905v1
- Date: Tue, 11 May 2021 09:50:05 GMT
- Title: Scene Understanding for Autonomous Driving
- Authors: \`Oscar Lorente, Ian Riera, Aditya Rana
- Abstract summary: We study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2.
We observe a significant improvement in performance after fine-tuning these models on the datasets of interest.
We run inference in unusual situations using out of context datasets, and present interesting results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To detect and segment objects in images based on their content is one of the
most active topics in the field of computer vision. Nowadays, this problem can
be addressed using Deep Learning architectures such as Faster R-CNN or YOLO,
among others. In this paper, we study the behaviour of different configurations
of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2. First, we
evaluate qualitatively and quantitatively (AP) the performance of the
pre-trained models on KITTI-MOTS and MOTSChallenge datasets. We observe a
significant improvement in performance after fine-tuning these models on the
datasets of interest and optimizing hyperparameters. Finally, we run inference
in unusual situations using out of context datasets, and present interesting
results that help us understanding better the networks.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Deep Learning Approaches for Human Action Recognition in Video Data [0.8080830346931087]
This study conducts an in-depth analysis of various deep learning models to address this challenge.
We focus on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Two-Stream ConvNets.
The results of this study underscore the potential of composite models in achieving robust human action recognition.
arXiv Detail & Related papers (2024-03-11T15:31:25Z) - Analyzing Local Representations of Self-supervised Vision Transformers [34.56680159632432]
We present a comparative analysis of various self-supervised Vision Transformers (ViTs)
Inspired by large language models, we examine the abilities of ViTs to perform various computer vision tasks with little to no fine-tuning.
arXiv Detail & Related papers (2023-12-31T11:38:50Z) - Influencer Detection with Dynamic Graph Neural Networks [56.1837101824783]
We investigate different dynamic Graph Neural Networks (GNNs) configurations for influencer detection.
We show that using deep multi-head attention in GNN and encoding temporal attributes significantly improves performance.
arXiv Detail & Related papers (2022-11-15T13:00:25Z) - Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution
Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications.
In this paper we propose an uncertainty quantification approach by modelling the distribution of features.
We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem.
We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z) - Revisiting Facial Key Point Detection: An Efficient Approach Using Deep
Neural Networks [0.0]
We develop efficient deep learning models in terms of model size, parameters, and inference time.
MobileNetV2 architecture produced the lowest RMSE and inference time.
manually optimized CNN architectures performed similarly to Auto Keras tuned architecture.
arXiv Detail & Related papers (2022-05-14T19:49:03Z) - Comparison Analysis of Traditional Machine Learning and Deep Learning
Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks.
Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN)
Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z) - Network Comparison Study of Deep Activation Feature Discriminability
with Novel Objects [0.5076419064097732]
State-of-the-art computer visions algorithms have incorporated Deep Neural Networks (DNN) in feature extracting roles, creating Deep Convolutional Activation Features (DeCAF)
This study analyzes the general discriminability of novel object visual appearances encoded into the DeCAF space of six of the leading visual recognition DNN architectures.
arXiv Detail & Related papers (2022-02-08T07:40:53Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Inferring Convolutional Neural Networks' accuracies from their
architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance.
We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems.
We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.