TwIPS: A Large Language Model Powered Texting Application to Simplify Conversational Nuances for Autistic Users
- URL: http://arxiv.org/abs/2407.17760v1
- Date: Thu, 25 Jul 2024 04:15:54 GMT
- Title: TwIPS: A Large Language Model Powered Texting Application to Simplify Conversational Nuances for Autistic Users
- Authors: Rukhshan Haroon, Fahad Dogar,
- Abstract summary: Autistic individuals often experience difficulties in conveying and interpreting emotional tone and non-literal nuances.
We present TwIPS, a prototype texting application powered by a large language model (LLM)
We leverage an AI-based simulation and a conversational script to evaluate TwIPS with 8 autistic participants in an in-lab setting.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autistic individuals often experience difficulties in conveying and interpreting emotional tone and non-literal nuances. Many also mask their communication style to avoid being misconstrued by others, spending considerable time and mental effort in the process. To address these challenges in text-based communication, we present TwIPS, a prototype texting application powered by a large language model (LLM), which can assist users with: a) deciphering tone and meaning of incoming messages, b) ensuring the emotional tone of their message is in line with their intent, and c) coming up with alternate phrasing for messages that could be misconstrued and received negatively by others. We leverage an AI-based simulation and a conversational script to evaluate TwIPS with 8 autistic participants in an in-lab setting. Our findings show TwIPS enables a convenient way for participants to seek clarifications, provides a better alternative to tone indicators, and facilitates constructive reflection on writing technique and style. We also examine how autistic users utilize language for self-expression and interpretation in instant messaging, and gather feedback for enhancing our prototype. We conclude with a discussion around balancing user-autonomy with AI-mediation, establishing appropriate trust levels in AI systems, and customization needs if autistic users in the context of AI-assisted communication
Related papers
- Detecting Mental Manipulation in Speech via Synthetic Multi-Speaker Dialogue [12.181747090385612]
Mental manipulation is the strategic use of language to covertly influence or exploit others.<n>We present the first study of mental manipulation detection in spoken dialogues.<n>Using few-shot large audio-language models and human annotation, we evaluate how modality affects detection accuracy and perception.
arXiv Detail & Related papers (2026-01-13T09:02:08Z) - Enhancing Public Speaking Skills in Engineering Students Through AI [0.48861336570452174]
This research-to-practice full paper was inspired by the persistent challenge in effective communication among engineering students.<n>Public speaking is a necessary skill for future engineers as they have to communicate technical knowledge with diverse stakeholders.<n>This study integrates research on verbal and non-verbal cues in public speaking to develop an AI-driven assessment model for engineering students.
arXiv Detail & Related papers (2025-11-07T05:44:15Z) - Empathic Prompting: Non-Verbal Context Integration for Multimodal LLM Conversations [45.06725378575657]
We present Empathic Prompting, a framework for multimodal human-AI interaction that enriches Large Language Model (LLM) conversations with implicit non-verbal context.<n>The system integrates a commercial facial expression recognition service to capture users' emotional cues and embeds them as contextual signals during prompting.
arXiv Detail & Related papers (2025-10-23T17:08:03Z) - Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio [52.859261069569165]
We propose the first unified framework capable of handling diverse combinations of sign language, lip movements, and audio for spoken-language text generation.<n>We focus on three main objectives: (i) designing a unified, modality-agnostic architecture capable of effectively processing heterogeneous inputs; (ii) exploring the underexamined synergy among modalities, particularly the role of lip movements as non-manual cues in sign language comprehension; and (iii) achieving performance on par with or better than state-of-the-art models specialized for individual tasks.
arXiv Detail & Related papers (2025-08-28T06:51:42Z) - Dual Information Speech Language Models for Emotional Conversations [48.094826104102204]
Speech-language models (SLMs), which use speech as input, are emerging as a promising solution.<n>We identify entangled information and improper training strategies as key issues.<n>Our approach disentangles paralinguistic and linguistic information, enabling SLMs to interpret speech through structured representations.
arXiv Detail & Related papers (2025-08-11T15:33:44Z) - Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders [10.664605070306417]
We propose a gesture-aware Automatic Speech Recognition (ASR) system with zero-shot learning for individuals with speech impairments.
Experiment results and analyses show that including gesture information significantly enhances semantic understanding.
arXiv Detail & Related papers (2025-02-18T14:15:55Z) - Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data [33.85748258158527]
Empathetic dialogue is crucial for natural human-computer interaction.
Large language models (LLMs) have revolutionized dialogue generation by harnessing their powerful capabilities.
We propose a novel approach that circumvents the need for question-answering data.
arXiv Detail & Related papers (2025-01-19T04:10:53Z) - Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion.
We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations.
Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z) - Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - Toward a Dialogue System Using a Large Language Model to Recognize User Emotions with a Camera [0.0]
Methods for AI agents to recognize emotions from the user's facial expressions have not been studied.
We examined whether or not LLM-based AI agents can interact with users according to their emotional states by capturing the user in dialogue with a camera.
Results confirmed that AI agents can have conversations according to the emotional state for emotional states with relatively high scores, such as Happy and Angry.
arXiv Detail & Related papers (2024-08-15T07:03:00Z) - WordDecipher: Enhancing Digital Workspace Communication with Explainable AI for Non-native English Speakers [11.242099987201573]
Non-native English speakers (NNES) face challenges in digital workspace communication.
Current AI-assisted writing tools are equipped with fluency enhancement and rewriting suggestions.
We propose WordDecipher, an explainable AI-assisted writing tool to enhance digital workspace communication.
arXiv Detail & Related papers (2024-04-10T13:40:29Z) - Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT)
Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework.
We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z) - Utilizing Speech Emotion Recognition and Recommender Systems for
Negative Emotion Handling in Therapy Chatbots [0.0]
This paper proposes an approach to enhance therapy chatbots with auditory perception, enabling them to understand users' feelings and provide human-like empathy.
The proposed method incorporates speech emotion recognition (SER) techniques using CNN models and the ShEMO dataset.
To provide a more immersive and empathetic user experience, a text-to-speech model called GlowTTS is integrated.
arXiv Detail & Related papers (2023-11-18T16:35:55Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - PicTalky: Augmentative and Alternative Communication Software for
Language Developmental Disabilities [2.2944351895226953]
Augmentative and alternative communication (AAC) is a practical means of communication for people with language disabilities.
We propose PicTalky, which is an AI-based AAC system that helps children with language developmental disabilities to improve their communication skills and language comprehension abilities.
arXiv Detail & Related papers (2021-09-27T10:46:14Z) - Hierarchical Summarization for Longform Spoken Dialog [1.995792341399967]
Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor.
Compared to understanding text, auditory communication poses many additional challenges such as speaker disfluencies, informal prose styles, and lack of structure.
We propose a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges.
arXiv Detail & Related papers (2021-08-21T23:31:31Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.