Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI
Platform for Smart-Toys
- URL: http://arxiv.org/abs/2311.11030v1
- Date: Sat, 18 Nov 2023 10:38:35 GMT
- Title: Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI
Platform for Smart-Toys
- Authors: Gabriel Cosache, Francisco Salgado, Cosmin Rotariu, George Sterpu,
Rishabh Jain and Peter Corcoran
- Abstract summary: The DAVID Smart-Toy platform is one of the first Edge AI platform designs.
It incorporates advanced low-power data processing by neural inference models co-located with the relevant image or audio sensors.
There is also on-board capability for in-device text-to-speech generation.
- Score: 2.740631793745274
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: An overview is given of the DAVID Smart-Toy platform, one of the first Edge
AI platform designs to incorporate advanced low-power data processing by neural
inference models co-located with the relevant image or audio sensors. There is
also on-board capability for in-device text-to-speech generation. Two
alternative embodiments are presented: a smart Teddy-bear, and a roving
dog-like robot. The platform offers a speech-driven user interface and can
observe and interpret user actions and facial expressions via its computer
vision sensor node. A particular benefit of this design is that no personally
identifiable information passes beyond the neural inference nodes thus
providing inbuilt compliance with data protection regulations.
Related papers
- AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs.
This paper explores the innovative concept of harnessing these AI-generated images as new data sources.
In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - Large Language Models Empowered Autonomous Edge AI for Connected
Intelligence [51.269276328087855]
Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence.
This article presents a vision of autonomous edge AI systems that automatically organize, adapt, and optimize themselves to meet users' diverse requirements.
arXiv Detail & Related papers (2023-07-06T05:16:55Z) - Object Recognition System on a Tactile Device for Visually Impaired [1.2891210250935146]
The device will convert visual information into auditory feedback, enabling users to understand their environment in a way that suits their sensory needs.
When the device is touched at a specific position, it provides an audio signal that communicates the identification of the object present in the scene at that corresponding position to the visually impaired individual.
arXiv Detail & Related papers (2023-07-05T11:37:17Z) - The System Model and the User Model: Exploring AI Dashboard Design [79.81291473899591]
We argue that sophisticated AI systems should have dashboards, just like all other complicated devices.
We conjecture that, for many systems, the two most important models will be of the user and of the system itself.
Finding ways to identify, interpret, and display these two models should be a core part of interface research for AI.
arXiv Detail & Related papers (2023-05-04T00:22:49Z) - edBB-Demo: Biometrics and Behavior Analysis for Online Educational
Platforms [17.38605546335716]
The edBB platform aims to study the challenges associated to user recognition and behavior understanding in digital platforms.
The information captured from the sensors during the student sessions is modelled in a multimodal learning framework.
arXiv Detail & Related papers (2022-11-16T20:53:56Z) - Knowledge Transfer For On-Device Speech Emotion Recognition with Neural
Structured Learning [19.220263739291685]
Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI)
We propose a neural structured learning (NSL) framework through building synthesized graphs.
Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance.
arXiv Detail & Related papers (2022-10-26T18:38:42Z) - Privacy attacks for automatic speech recognition acoustic models in a
federated learning framework [5.1229352884025845]
We propose an approach to analyze information in neural network AMs based on a neural network footprint on the Indicator dataset.
Experiments on the TED-LIUM 3 corpus demonstrate that the proposed approaches are very effective and can provide equal error rate (EER) of 1-2%.
arXiv Detail & Related papers (2021-11-06T02:08:13Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - A Deep Learning based Wearable Healthcare IoT Device for AI-enabled
Hearing Assistance Automation [6.283190933140046]
This research presents a novel AI-enabled Internet of Things (IoT) device capable of assisting those who suffer from impairment of hearing or deafness to communicate with others in conversations.
A server application is created that leverages Google's online speech recognition service to convert the received conversations into texts, then deployed to a micro-display attached to the glasses to display the conversation contents to deaf people.
arXiv Detail & Related papers (2020-05-16T19:42:16Z) - An End-to-End Visual-Audio Attention Network for Emotion Recognition in
User-Generated Videos [64.91614454412257]
We propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs)
Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN.
arXiv Detail & Related papers (2020-02-12T15:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.