Challenges and Trends in Egocentric Vision: A Survey
- URL: http://arxiv.org/abs/2503.15275v2
- Date: Thu, 03 Apr 2025 08:06:35 GMT
- Title: Challenges and Trends in Egocentric Vision: A Survey
- Authors: Xiang Li, Heqian Qiu, Lanxiao Wang, Hanwen Zhang, Chenghao Qi, Linfeng Han, Huiyu Xiong, Hongliang Li,
- Abstract summary: Egocentric vision captures visual and multimodal data through cameras or sensors worn on the human body.<n>This paper provides a comprehensive survey of the research on egocentric vision understanding.<n>By summarizing the latest advancements, we anticipate the broad applications of egocentric vision technologies in fields such as augmented reality, virtual reality, and embodied intelligence.
- Score: 11.593894126370724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of artificial intelligence technologies and wearable devices, egocentric vision understanding has emerged as a new and challenging research direction, gradually attracting widespread attention from both academia and industry. Egocentric vision captures visual and multimodal data through cameras or sensors worn on the human body, offering a unique perspective that simulates human visual experiences. This paper provides a comprehensive survey of the research on egocentric vision understanding, systematically analyzing the components of egocentric scenes and categorizing the tasks into four main areas: subject understanding, object understanding, environment understanding, and hybrid understanding. We explore in detail the sub-tasks within each category. We also summarize the main challenges and trends currently existing in the field. Furthermore, this paper presents an overview of high-quality egocentric vision datasets, offering valuable resources for future research. By summarizing the latest advancements, we anticipate the broad applications of egocentric vision technologies in fields such as augmented reality, virtual reality, and embodied intelligence, and propose future research directions based on the latest developments in the field.
Related papers
- Fairness and Bias Mitigation in Computer Vision: A Survey [61.01658257223365]
Computer vision systems are increasingly being deployed in high-stakes real-world applications.
There is a dire need to ensure that they do not propagate or amplify any discriminatory tendencies in historical or human-curated data.
This paper presents a comprehensive survey on fairness that summarizes and sheds light on ongoing trends and successes in the context of computer vision.
arXiv Detail & Related papers (2024-08-05T13:44:22Z) - Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [129.08019405056262]
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial Intelligence (AGI)
MLMs andWMs have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities.
In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI.
arXiv Detail & Related papers (2024-07-09T14:14:47Z) - Vision-based Learning for Drones: A Survey [1.280979348722635]
Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning.
This review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities.
We explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios.
arXiv Detail & Related papers (2023-12-08T12:57:13Z) - Unlocking the Emotional World of Visual Media: An Overview of the
Science, Research, and Impact of Understanding Emotion [24.920797480215242]
This article provides a comprehensive overview of the field of emotion analysis in visual media.
We discuss the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos.
We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry.
arXiv Detail & Related papers (2023-07-25T12:47:21Z) - Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
multimodal machine learning that incorporates data from various sources has become an increasingly popular research area.
We analyze the commonness and uniqueness of each data format mainly ranging from vision, audio, text, and motions.
We investigate the existing literature on multimodal learning from both the representation learning and downstream application levels.
arXiv Detail & Related papers (2022-10-05T13:14:57Z) - Deep Learning to See: Towards New Foundations of Computer Vision [88.69805848302266]
This book criticizes the supposed scientific progress in the field of computer vision.
It proposes the investigation of vision within the framework of information-based laws of nature.
arXiv Detail & Related papers (2022-06-30T15:20:36Z) - Visual Sensation and Perception Computational Models for Deep Learning:
State of the art, Challenges and Prospects [7.949330621850412]
visual sensation and perception refers to the process of sensing, organizing, identifying, and interpreting visual information in environmental awareness and understanding.
Computational models inspired by visual perception have the characteristics of complexity and diversity, as they come from many subjects such as cognition science, information science, and artificial intelligence.
arXiv Detail & Related papers (2021-09-08T01:51:24Z) - Predicting the Future from First Person (Egocentric) Vision: A Survey [18.07516837332113]
This survey summarises the evolution of studies in the context of future prediction from egocentric vision.
It makes an overview of applications, devices, existing problems, commonly used datasets, models and input modalities.
Our analysis highlights that methods for future prediction from egocentric vision can have a significant impact in a range of applications.
arXiv Detail & Related papers (2021-07-28T14:58:13Z) - Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.
This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z) - A Review on Intelligent Object Perception Methods Combining
Knowledge-based Reasoning and Machine Learning [60.335974351919816]
Object perception is a fundamental sub-field of Computer Vision.
Recent works seek ways to integrate knowledge engineering in order to expand the level of intelligence of the visual interpretation of objects.
arXiv Detail & Related papers (2019-12-26T13:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.