Deep learning for scene recognition from visual data: a survey
- URL: http://arxiv.org/abs/2007.01806v1
- Date: Fri, 3 Jul 2020 16:53:18 GMT
- Title: Deep learning for scene recognition from visual data: a survey
- Authors: Alina Matei, Andreea Glavan, and Estefania Talavera
- Abstract summary: This work aims to be a review of the state-of-the-art in scene recognition with deep learning models from visual data.
Scene recognition is still an emerging field in computer vision, which has been addressed from a single image and dynamic image perspective.
- Score: 2.580765958706854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of deep learning techniques has exploded during the last few years,
resulting in a direct contribution to the field of artificial intelligence.
This work aims to be a review of the state-of-the-art in scene recognition with
deep learning models from visual data. Scene recognition is still an emerging
field in computer vision, which has been addressed from a single image and
dynamic image perspective. We first give an overview of available datasets for
image and video scene recognition. Later, we describe ensemble techniques
introduced by research papers in the field. Finally, we give some remarks on
our findings and discuss what we consider challenges in the field and future
lines of research. This paper aims to be a future guide for model selection for
the task of scene recognition.
Related papers
- Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans.
This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations.
In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z) - Knowledge-enhanced Multi-perspective Video Representation Learning for
Scene Recognition [33.800842679024164]
We address the problem of video scene recognition, whose goal is to learn a high-level video representation to classify scenes in videos.
Most existing works identify scenes for videos only from visual or textual information in a temporal perspective.
We propose a novel two-stream framework to model video representations from multiple perspectives.
arXiv Detail & Related papers (2024-01-09T04:37:10Z) - Vision-Language Pre-training: Basics, Recent Advances, and Future Trends [158.34830433299268]
Vision-language pre-training methods for multimodal intelligence have been developed in the last few years.
For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced.
In addition, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.
arXiv Detail & Related papers (2022-10-17T17:11:36Z) - Deep Learning for Visual Speech Analysis: A Survey [54.53032361204449]
This paper presents a review of recent progress in deep learning methods on visual speech analysis.
We cover different aspects of visual speech, including fundamental problems, challenges, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance.
arXiv Detail & Related papers (2022-05-22T14:44:53Z) - Compositional Scene Representation Learning via Reconstruction: A Survey [48.33349317481124]
Compositional scene representation learning is a task that enables such abilities.
Deep neural networks have been proven to be advantageous in representation learning.
Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation.
arXiv Detail & Related papers (2022-02-15T02:14:05Z) - Deep Learning for Scene Classification: A Survey [48.57123373347695]
Scene classification is a longstanding, fundamental and challenging problem in computer vision.
The rise of large-scale datasets and the renaissance of deep learning techniques have brought remarkable progress in the field of scene representation and classification.
This paper provides a comprehensive survey of recent achievements in scene classification using deep learning.
arXiv Detail & Related papers (2021-01-26T03:06:50Z) - Visual Relationship Detection using Scene Graphs: A Survey [1.3505077405741583]
A Scene Graph is a technique to better represent a scene and the various relationships present in it.
We present a detailed survey on the various techniques for scene graph generation, their efficacy to represent visual relationships and how it has been used to solve various downstream tasks.
arXiv Detail & Related papers (2020-05-16T17:06:06Z) - Text Recognition in the Wild: A Survey [33.22076515689926]
This literature review attempts to present the entire picture of the field of scene text recognition.
It provides a comprehensive reference for people entering this field, and could be helpful to inspire future research.
arXiv Detail & Related papers (2020-05-07T13:57:04Z) - Image Segmentation Using Deep Learning: A Survey [58.37211170954998]
Image segmentation is a key topic in image processing and computer vision.
There has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models.
arXiv Detail & Related papers (2020-01-15T21:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.