PicTalky: Augmentative and Alternative Communication Software for
Language Developmental Disabilities
- URL: http://arxiv.org/abs/2109.12941v1
- Date: Mon, 27 Sep 2021 10:46:14 GMT
- Title: PicTalky: Augmentative and Alternative Communication Software for
Language Developmental Disabilities
- Authors: Chanjun Park, Yoonna Jang, Seolhwa Lee, Jaehyung Seo, Kisu Yang,
Heuiseok Lim
- Abstract summary: Augmentative and alternative communication (AAC) is a practical means of communication for people with language disabilities.
We propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities.
- Score: 2.2944351895226953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Augmentative and alternative communication (AAC) is a practical means of
communication for people with language disabilities. In this study, we propose
PicTalky, which is an AI-based AAC system that helps children with language
developmental disabilities to improve their communication skills and language
comprehension abilities. PicTalky can process both text and pictograms more
accurately by connecting a series of neural-based NLP modules. Moreover, we
perform quantitative and qualitative analyses on the essential features of
PicTalky. It is expected that those suffering from language problems will be
able to express their intentions or desires more easily and improve their
quality of life by using this service. We have made the models freely available
alongside a demonstration of the Web interface. Furthermore, we implemented
robotics AAC for the first time by applying PicTalky to the NAO robot.
Related papers
- Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge.
This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition.
We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z) - MindSpeech: Continuous Imagined Speech Decoding using High-Density fNIRS and Prompt Tuning for Advanced Human-AI Interaction [0.0]
This paper reports a novel method for human-AI interaction by developing a direct brain-AI interface.
We discuss a novel AI model, called MindSpeech, which enables open-vocabulary, continuous decoding for imagined speech.
We demonstrate significant improvements in key metrics, such as BLEU-1 and BERT P scores, for three out of four participants.
arXiv Detail & Related papers (2024-07-25T16:39:21Z) - TwIPS: A Large Language Model Powered Texting Application to Simplify Conversational Nuances for Autistic Users [0.0]
Autistic individuals often experience difficulties in conveying and interpreting emotional tone and non-literal nuances.
We present TwIPS, a prototype texting application powered by a large language model (LLM)
We leverage an AI-based simulation and a conversational script to evaluate TwIPS with 8 autistic participants in an in-lab setting.
arXiv Detail & Related papers (2024-07-25T04:15:54Z) - FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs [63.8261207950923]
FunAudioLLM is a model family designed to enhance natural voice interactions between humans and large language models (LLMs)
At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity.
The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub.
arXiv Detail & Related papers (2024-07-04T16:49:02Z) - Speech-Gesture GAN: Gesture Generation for Robots and Embodied Agents [5.244401764969407]
Embodied agents, in the form of virtual agents or social robots, are rapidly becoming more widespread.
We propose a novel framework that can generate sequences of joint angles from the speech text and speech audio utterances.
arXiv Detail & Related papers (2023-09-17T18:46:25Z) - Visual-Aware Text-to-Speech [101.89332968344102]
We present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and visual feedback of the listener in face-to-face communication.
We devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis.
arXiv Detail & Related papers (2023-06-21T05:11:39Z) - Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [66.43223397997559]
We aim to synthesize high-quality talking portrait videos corresponding to the input text.
This task has broad application prospects in the digital human industry but has not been technically achieved yet.
We introduce Adaptive Text-to-Talking Avatar (Ada-TTA), which designs a generic zero-shot multi-speaker Text-to-Speech model.
arXiv Detail & Related papers (2023-06-06T08:50:13Z) - Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding [55.989376102986654]
This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech problem under the few-shot setting.
We propose a framework that consists of a phoneme-based TTS model and a codebook module to project phonemes from different languages into a learned latent space.
arXiv Detail & Related papers (2022-06-27T11:24:40Z) - FreeTalky: Don't Be Afraid! Conversations Made Easier by a Humanoid
Robot using Persona-based Dialogue [1.7651013017598882]
We propose a deep learning-based foreign language learning platform, named FreeTalky, for people who experience anxiety dealing with foreign languages.
A persona-based dialogue system that is embedded in NAO provides an interesting and consistent multi-turn dialogue for users.
arXiv Detail & Related papers (2021-12-08T05:48:11Z) - Introducing the Talk Markup Language (TalkML):Adding a little social
intelligence to industrial speech interfaces [0.0]
Natural language understanding is one of the more disappointing failures of AI research.
This paper describes how we have taken ideas from other disciplines and implemented them.
arXiv Detail & Related papers (2021-05-24T14:25:35Z) - Structural and Functional Decomposition for Personality Image Captioning
in a Communication Game [53.74847926974122]
Personality image captioning (PIC) aims to describe an image with a natural language caption given a personality trait.
We introduce a novel formulation for PIC based on a communication game between a speaker and a listener.
arXiv Detail & Related papers (2020-11-17T10:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.