Related papers: Fast and Efficient Scene Categorization for Autonomous Driving using VAEs

Fast and Efficient Scene Categorization for Autonomous Driving using VAEs

URL: http://arxiv.org/abs/2210.14981v1
Date: Wed, 26 Oct 2022 18:50:15 GMT
Title: Fast and Efficient Scene Categorization for Autonomous Driving using VAEs
Authors: Saravanabalagi Ramachandran, Jonathan Horgan, Ganesh Sistu, and John McDonald
Abstract summary: Scene categorization is a useful precursor task that provides prior knowledge for advanced computer vision tasks. We propose to generate a global descriptor that captures coarse features from the image and use a classification head to map the descriptors to 3 scene categories: Rural, Urban and Suburban. The proposed global descriptor is very compact with an embedding length of 128, significantly faster to compute, and is robust to seasonal and illuminational changes.
Score: 2.694218293356451
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scene categorization is a useful precursor task that provides prior knowledge for many advanced computer vision tasks with a broad range of applications in content-based image indexing and retrieval systems. Despite the success of data driven approaches in the field of computer vision such as object detection, semantic segmentation, etc., their application in learning high-level features for scene recognition has not achieved the same level of success. We propose to generate a fast and efficient intermediate interpretable generalized global descriptor that captures coarse features from the image and use a classification head to map the descriptors to 3 scene categories: Rural, Urban and Suburban. We train a Variational Autoencoder in an unsupervised manner and map images to a constrained multi-dimensional latent space and use the latent vectors as compact embeddings that serve as global descriptors for images. The experimental results evidence that the VAE latent vectors capture coarse information from the image, supporting their usage as global descriptors. The proposed global descriptor is very compact with an embedding length of 128, significantly faster to compute, and is robust to seasonal and illuminational changes, while capturing sufficient scene information required for scene categorization.

Related papers

LDCA: Local Descriptors with Contextual Augmentation for Few-Shot Learning [0.0]
We introduce a novel approach termed "Local Descriptor with Contextual Augmentation (LDCA)" LDCA bridges the gap between local and global understanding by leveraging an adaptive global contextual enhancement module. Experiments underscore the efficacy of our method, showing a maximal absolute improvement of 20% over the next-best on fine-grained classification datasets.
arXiv Detail & Related papers (2024-01-24T14:44:48Z)
Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning [55.517000360348725]
This work presents a framework for dealing with 3D scene understanding when the labeled scenes are quite limited. To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy. Experiments with both indoor and outdoor scenes demonstrated the effectiveness of our approach in both data-efficient learning and open-world few-shot learning.
arXiv Detail & Related papers (2023-12-01T15:47:04Z)
A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z)
Learning-Based Dimensionality Reduction for Computing Compact and Effective Local Feature Descriptors [101.62384271200169]
A distinctive representation of image patches in form of features is a key component of many computer vision and robotics tasks. We investigate multi-layer perceptrons (MLPs) to extract low-dimensional but high-quality descriptors. We consider different applications, including visual localization, patch verification, image matching and retrieval.
arXiv Detail & Related papers (2022-09-27T17:59:04Z)
Vision Transformers: From Semantic Segmentation to Dense Prediction [139.15562023284187]
We explore the global context learning potentials of vision transformers (ViTs) for dense visual prediction. Our motivation is that through learning global context at full receptive field layer by layer, ViTs may capture stronger long-range dependency information. We formulate a family of Hierarchical Local-Global (HLG) Transformers, characterized by local attention within windows and global-attention across windows in a pyramidal architecture.
arXiv Detail & Related papers (2022-07-19T15:49:35Z)
Self-attention on Multi-Shifted Windows for Scene Segmentation [14.47974086177051]
We explore the effective use of self-attention within multi-scale image windows to learn descriptive visual features. We propose three different strategies to aggregate these feature maps to decode the feature representation for dense prediction. Our models achieve very promising performance on four public scene segmentation datasets.
arXiv Detail & Related papers (2022-07-10T07:36:36Z)
A Novel Image Descriptor with Aggregated Semantic Skeleton Representation for Long-term Visual Place Recognition [0.0]
We propose a novel image descriptor with aggregated semantic skeleton representation (SSR) The SSR-VLAD of one image aggregates the semantic skeleton features of each category and encodes the spatial-temporal distribution information of the image semantic information. We conduct a series of experiments on three public datasets of challenging urban scenes.
arXiv Detail & Related papers (2022-02-08T06:49:38Z)
Superevents: Towards Native Semantic Segmentation for Event-based Cameras [13.099264910430986]
Most successful computer vision models transform low-level features, such as Gabor filter responses, into richer representations of intermediate or mid-level complexity for downstream visual tasks. We present a novel method that employs lifetime augmentation for obtaining an event stream representation that is fed to a fully convolutional network to extract superevents.
arXiv Detail & Related papers (2021-05-13T05:49:41Z)
Self-supervised Human Detection and Segmentation via Multi-view Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training. We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A Learnable Scene Descriptor [51.298760338410624]
We propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information. The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene. We also design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label.
arXiv Detail & Related papers (2020-01-24T16:53:30Z)
Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. We combine the resulting representations with global scene information for accurately predicting visual saliency. Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.