The MIT Voice Name System
- URL: http://arxiv.org/abs/2204.09657v1
- Date: Mon, 28 Mar 2022 19:09:26 GMT
- Title: The MIT Voice Name System
- Authors: Brian Subirana and Harry Levinson and Ferran Hueto and Prithvi
Rajasekaran and Alexander Gaidis and Esteve Tarrag\'o and Peter
Oliveira-Soens
- Abstract summary: We aim to standardize voice interactions to a universal reach similar to that of other systems such as phone numbering.
We focus on voice as a starting point to talk to any IoT object.
Privacy and security are key elements considered because of speech-to-text errors and the amount of personal information contained in a voice sample.
- Score: 53.473846742702854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This RFC white Paper summarizes our progress on the MIT Voice Name System
(VNS) and Huey. The VNS, similar in name and function to the DNS, is a system
to reserve and use "wake words" to activate Artificial Intelligence (AI)
devices. Just like you can say "Hey Siri" to activate Apple's personal
assistant, we propose using the VNS in smart speakers and other devices to
route wake requests based on commands such as "turn off", "open grocery
shopping list" or "271, start flash card review of my computer vision class".
We also introduce Huey, an unambiguous Natural Language to interact with AI
devices. We aim to standardize voice interactions to a universal reach similar
to that of other systems such as phone numbering, with an agreed world-wide
approach to assign and use numbers, or the Internet's DNS, with a standard
naming system, that has helped flourish popular services including the
World-Wide-Web, FTP, and email. Just like these standards are "neutral", we
also aim to endow the VNS with "wake neutrality" so that each participant can
develop its own digital voice. We focus on voice as a starting point to talk to
any IoT object and explain briefly how the VNS may be expanded to other AI
technologies enabling person-to-machine conversations (really
machine-to-machine), including computer vision or neural interfaces. We also
describe briefly considerations for a broader set of standards, MIT Open AI
(MOA), including a reference architecture to serve as a starting point for the
development of a general conversational commerce infrastructure that has
standard "Wake Words", NLP commands such as "Shopping Lists" or "Flash Card
Reviews", and personalities such as Pi or 271. Privacy and security are key
elements considered because of speech-to-text errors and the amount of personal
information contained in a voice sample.
Related papers
- Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI
Platform for Smart-Toys [2.740631793745274]
The DAVID Smart-Toy platform is one of the first Edge AI platform designs.
It incorporates advanced low-power data processing by neural inference models co-located with the relevant image or audio sensors.
There is also on-board capability for in-device text-to-speech generation.
arXiv Detail & Related papers (2023-11-18T10:38:35Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories.
We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z) - An Artificial Intelligence Browser Architecture (AIBA) For Our Kind and
Others: A Voice Name System Speech implementation with two warrants, Wake
Neutrality and Value Preservation of Privately Identifiable Information [0.0]
Conversational commerce is the first of may applications based on always-on artificial intelligence systems that decide on its own when to interact with the environment.
Current dominant systems are closed garden solutions without wake neutrality and that can't fully exploit the PII data they have because of IRB and Cohues-type constraints.
We present a voice browser-and-server architecture that aims to address these two limitations by offering wake neutrality and the possibility to handle PII aiming to maximize its value.
arXiv Detail & Related papers (2022-03-29T11:49:41Z) - Neural Approaches to Conversational Information Retrieval [94.77863916314979]
A conversational information retrieval (CIR) system is an information retrieval (IR) system with a conversational interface.
Recent progress in deep learning has brought tremendous improvements in natural language processing (NLP) and conversational AI.
This book surveys recent advances in CIR, focusing on neural approaches that have been developed in the last few years.
arXiv Detail & Related papers (2022-01-13T19:04:59Z) - Stop Bugging Me! Evading Modern-Day Wiretapping Using Adversarial
Perturbations [47.32228513808444]
Mass surveillance systems for voice over IP (VoIP) conversations pose a great risk to privacy.
We present an adversarial-learning-based framework for privacy protection for VoIP conversations.
arXiv Detail & Related papers (2020-10-24T06:56:35Z) - End-to-End Learning of Speech 2D Feature-Trajectory for Prosthetic Hands [0.48951183832371004]
We propose an end-to-end convolutional neural network (CNN) that maps speech 2D features directly to trajectories for prosthetic hands.
The network is written in Python with Keras library that has a corresponding backend.
We optimized the CNN for NVIDIA Jetson TX2 developer kit.
arXiv Detail & Related papers (2020-09-22T02:31:00Z) - Implementation of Google Assistant & Amazon Alexa on Raspberry Pi [0.0]
This paper investigates the implementation of voice-enabled Google Assistant and Amazon Alexa on Raspberry Pi.
A voice-enabled system essentially means a system that processes voice as an input, decodes, or understands the meaning of that input and generates an appropriate voice output.
arXiv Detail & Related papers (2020-06-15T08:46:48Z) - A Deep Learning based Wearable Healthcare IoT Device for AI-enabled
Hearing Assistance Automation [6.283190933140046]
This research presents a novel AI-enabled Internet of Things (IoT) device capable of assisting those who suffer from impairment of hearing or deafness to communicate with others in conversations.
A server application is created that leverages Google's online speech recognition service to convert the received conversations into texts, then deployed to a micro-display attached to the glasses to display the conversation contents to deaf people.
arXiv Detail & Related papers (2020-05-16T19:42:16Z) - VGAI: End-to-End Learning of Vision-Based Decentralized Controllers for
Robot Swarms [237.25930757584047]
We propose to learn decentralized controllers based on solely raw visual inputs.
For the first time, that integrates the learning of two key components: communication and visual perception.
Our proposed learning framework combines a convolutional neural network (CNN) for each robot to extract messages from the visual inputs, and a graph neural network (GNN) over the entire swarm to transmit, receive and process these messages.
arXiv Detail & Related papers (2020-02-06T15:25:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.