Related papers: Implementation of Google Assistant & Amazon Alexa on Raspberry Pi

Implementation of Google Assistant & Amazon Alexa on Raspberry Pi

URL: http://arxiv.org/abs/2006.08220v1
Date: Mon, 15 Jun 2020 08:46:48 GMT
Title: Implementation of Google Assistant & Amazon Alexa on Raspberry Pi
Authors: Shailesh D. Arya, Dr. Samir Patel
Abstract summary: This paper investigates the implementation of voice-enabled Google Assistant and Amazon Alexa on Raspberry Pi. A voice-enabled system essentially means a system that processes voice as an input, decodes, or understands the meaning of that input and generates an appropriate voice output.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates the implementation of voice-enabled Google Assistant and Amazon Alexa on Raspberry Pi. Virtual Assistants are being a new trend in how we interact or do computations with physical devices. A voice-enabled system essentially means a system that processes voice as an input, decodes, or understands the meaning of that input and generates an appropriate voice output. In this paper, we are developing a smart speaker prototype that has the functionalities of both in the same Raspberry Pi. Users can invoke a virtual assistant by saying the hot words and can leverage the best services of both eco-systems. This paper also explains the complex architecture of Google Assistant and Amazon Alexa and the working of both assistants as well. Later, this system can be used to control the smart home IoT devices.

Related papers

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems [57.806797579986075]
We introduce an open-source, user-friendly toolkit to build unified web interfaces for various cascaded and E2E spoken dialogue systems. Using the evaluation metrics, we compare various cascaded and E2E spoken dialogue systems with a human-human conversation dataset as a proxy. Our analysis demonstrates that the toolkit allows researchers to effortlessly compare and contrast different technologies.
arXiv Detail & Related papers (2025-03-11T15:24:02Z)
Evaluating Synthetic Command Attacks on Smart Voice Assistants [2.91784559412979]
We show that even simple concatenative speech synthesis can be used by an attacker to command voice assistants to perform sensitive operations. Our results demonstrate the need for better defenses against synthetic malicious commands that could target voice assistants.
arXiv Detail & Related papers (2024-11-13T03:51:58Z)
AudioPaLM: A Large Language Model That Can Speak and Listen [79.44757696533709]
We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models. It can process and generate text and speech with applications including speech recognition and speech-to-speech translation.
arXiv Detail & Related papers (2023-06-22T14:37:54Z)
Rewriting the Script: Adapting Text Instructions for Voice Interaction [39.54213483588498]
We study the limitations of the dominant approach voice assistants take to complex task guidance. We propose eight ways in which voice assistants can transform written sources into forms that are readily communicated through spoken conversation.
arXiv Detail & Related papers (2023-06-16T17:43:00Z)
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head [82.69233563811487]
Large language models (LLMs) have exhibited remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. We propose a multi-modal AI system named AudioGPT, which complements LLMs with foundation models to process complex audio information.
arXiv Detail & Related papers (2023-04-25T17:05:38Z)
Virtual Mouse And Assistant: A Technological Revolution Of Artificial Intelligence [0.0]
The purpose of this paper is to enhance the performance of the virtual assistant. Virtual assistants can complete practically any specific smartphone or PC activity that you can complete on your own.
arXiv Detail & Related papers (2023-03-11T05:00:06Z)
SkillFence: A Systems Approach to Practically Mitigating Voice-Based Confusion Attacks [9.203566746598439]
Recent work has shown that commercial systems like Amazon Alexa and Google Home are vulnerable to voice-based confusion attacks. We propose a systems-oriented defense against this class of attacks and demonstrate its functionality for Amazon Alexa. We build SkilIFence, a browser extension that existing voice assistant users can install to ensure that only legitimate skills run in response to their commands.
arXiv Detail & Related papers (2022-12-16T22:22:04Z)
The MIT Voice Name System [53.473846742702854]
We aim to standardize voice interactions to a universal reach similar to that of other systems such as phone numbering. We focus on voice as a starting point to talk to any IoT object. Privacy and security are key elements considered because of speech-to-text errors and the amount of personal information contained in a voice sample.
arXiv Detail & Related papers (2022-03-28T19:09:26Z)
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate. We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique. Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z)
Self-supervised reinforcement learning for speaker localisation with the iCub humanoid robot [58.2026611111328]
Looking at a person's face is one of the mechanisms that humans rely on when it comes to filtering speech in noisy environments. Having a robot that can look toward a speaker could benefit ASR performance in challenging environments. We propose a self-supervised reinforcement learning-based framework inspired by the early development of humans.
arXiv Detail & Related papers (2020-11-12T18:02:15Z)
VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture [71.45920122349628]
Auto-encoder-based VC methods disentangle the speaker and the content in input speech without given the speaker's identity. We use the U-Net architecture within an auto-encoder-based VC system to improve audio quality.
arXiv Detail & Related papers (2020-06-07T14:01:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.