AEGIS: A real-time multimodal augmented reality computer vision based
system to assist facial expression recognition for individuals with autism
spectrum disorder
- URL: http://arxiv.org/abs/2010.11884v1
- Date: Thu, 22 Oct 2020 17:20:38 GMT
- Title: AEGIS: A real-time multimodal augmented reality computer vision based
system to assist facial expression recognition for individuals with autism
spectrum disorder
- Authors: James Ren Hou Lee, Alexander Wong
- Abstract summary: This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN)
The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses.
We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
- Score: 93.0013343535411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to interpret social cues comes naturally for most people, but for
those living with Autism Spectrum Disorder (ASD), some experience a deficiency
in this area. This paper presents the development of a multimodal augmented
reality (AR) system which combines the use of computer vision and deep
convolutional neural networks (CNN) in order to assist individuals with the
detection and interpretation of facial expressions in social settings. The
proposed system, which we call AEGIS (Augmented-reality Expression Guided
Interpretation System), is an assistive technology deployable on a variety of
user devices including tablets, smartphones, video conference systems, or
smartglasses, showcasing its extreme flexibility and wide range of use cases,
to allow integration into daily life with ease. Given a streaming video camera
source, each real-world frame is passed into AEGIS, processed for facial
bounding boxes, and then fed into our novel deep convolutional time windowed
neural network (TimeConvNet). We leverage both spatial and temporal information
in order to provide an accurate expression prediction, which is then converted
into its corresponding visualization and drawn on top of the original video
frame. The system runs in real-time, requires minimal set up and is simple to
use. With the use of AEGIS, we can assist individuals living with ASD to learn
to better identify expressions and thus improve their social experiences.
Related papers
- Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI [10.335943413484815]
seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment.
We introduce a multimodal 3D object representation that unifies both semantic and linguistic knowledge with the geometric representation.
We demonstrate the usefulness of the proposed system through two real-world AR applications on Magic Leap 2: a) spatial search in physical environments with natural language and b) an intelligent inventory system that tracks object changes over time.
arXiv Detail & Related papers (2024-10-06T23:25:21Z) - AIris: An AI-powered Wearable Assistive Device for the Visually Impaired [0.0]
We introduce AIris, an AI-powered wearable device that provides environmental awareness and interaction capabilities to visually impaired users.
We have created a functional prototype system that operates effectively in real-world conditions.
arXiv Detail & Related papers (2024-05-13T10:09:37Z) - Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction [8.63068449082585]
Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition.
Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D.
We have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development.
arXiv Detail & Related papers (2024-04-30T10:41:23Z) - Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey Beyond 6G [58.440115433585824]
Building future wireless systems that support services like digital twins (DTs) is challenging to achieve through advances to conventional technologies like meta-surfaces.
While artificial intelligence (AI)-native networks promise to overcome some limitations of wireless technologies, developments still rely on AI tools like neural networks.
This paper revisits the concept of AI-native wireless systems, equipping them with the common sense necessary to transform them into artificial general intelligence (AGI)-native systems.
arXiv Detail & Related papers (2024-04-29T04:51:05Z) - Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data.
We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - Facial Expressions Recognition with Convolutional Neural Networks [0.0]
We will be diving into implementing a system for recognition of facial expressions (FER) by leveraging neural networks.
We demonstrate a state-of-the-art single-network-accuracy of 70.10% on the FER2013 dataset without using any additional training data.
arXiv Detail & Related papers (2021-07-19T06:41:00Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - TimeConvNets: A Deep Time Windowed Convolution Neural Network Design for
Real-time Video Facial Expression Recognition [93.0013343535411]
This study explores a novel deep time windowed convolutional neural network design (TimeConvNets) for the purpose of real-time video facial expression recognition.
We show that TimeConvNets can better capture the transient nuances of facial expressions and boost classification accuracy while maintaining a low inference time.
arXiv Detail & Related papers (2020-03-03T20:58:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.