Evaluating the Impact of AI-Powered Audiovisual Personalization on Learner Emotion, Focus, and Learning Outcomes
- URL: http://arxiv.org/abs/2505.03033v1
- Date: Mon, 05 May 2025 21:19:50 GMT
- Title: Evaluating the Impact of AI-Powered Audiovisual Personalization on Learner Emotion, Focus, and Learning Outcomes
- Authors: George Xi Wang, Jingying Deng, Safinah Ali,
- Abstract summary: We introduce an AI-powered system that uses LLMs to generate personalized multisensory study environments.<n>Our primary research question investigates how combinations of personalized audiovisual elements affect learner cognitive load and engagement.<n>The findings aim to advance emotionally responsive educational technologies and extend the application of multimodal LLMs into the sensory dimension of self-directed learning.
- Score: 5.753241925582828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Independent learners often struggle with sustaining focus and emotional regulation in unstructured or distracting settings. Although some rely on ambient aids such as music, ASMR, or visual backgrounds to support concentration, these tools are rarely integrated into cohesive, learner-centered systems. Moreover, existing educational technologies focus primarily on content adaptation and feedback, overlooking the emotional and sensory context in which learning takes place. Large language models have demonstrated powerful multimodal capabilities including the ability to generate and adapt text, audio, and visual content. Educational research has yet to fully explore their potential in creating personalized audiovisual learning environments. To address this gap, we introduce an AI-powered system that uses LLMs to generate personalized multisensory study environments. Users select or generate customized visual themes (e.g., abstract vs. realistic, static vs. animated) and auditory elements (e.g., white noise, ambient ASMR, familiar vs. novel sounds) to create immersive settings aimed at reducing distraction and enhancing emotional stability. Our primary research question investigates how combinations of personalized audiovisual elements affect learner cognitive load and engagement. Using a mixed-methods design that incorporates biometric measures and performance outcomes, this study evaluates the effectiveness of LLM-driven sensory personalization. The findings aim to advance emotionally responsive educational technologies and extend the application of multimodal LLMs into the sensory dimension of self-directed learning.
Related papers
- Examining the Role of LLM-Driven Interactions on Attention and Cognitive Engagement in Virtual Classrooms [9.241265477406078]
We investigate how peer question-asking behaviors influenced student engagement, attention, cognitive load, and learning outcomes.<n>Our results suggest that peer questions did not introduce extraneous cognitive load directly, as the cognitive load is strongly correlated with increased attention to the learning material.
arXiv Detail & Related papers (2025-05-12T09:21:19Z) - Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation [58.189703277322224]
Speech-preserving facial expression manipulation (SPFEM) aims to modify a talking head to display a specific reference emotion.<n>Emotion and content information existing in reference and source inputs can provide direct and accurate supervision signals for SPFEM models.<n>We propose to learn content and emotion priors as guidance augmented with contrastive learning to learn decoupled content and emotion representation.
arXiv Detail & Related papers (2025-04-08T04:34:38Z) - Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models [20.210120763433167]
We propose a Self-Knowledge Distillation (Self-KD) training method where the vision-text component of the OLLM serves as the teacher and the vision-audio component as the student.<n>Our experimental results demonstrate that Self-KD is an effective method for enhancing the vision-audio capabilities of OLLMs.
arXiv Detail & Related papers (2025-02-27T02:19:09Z) - Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data.<n>Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge.<n>We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z) - Emotion Based Prediction in the Context of Optimized Trajectory Planning
for Immersive Learning [0.0]
In the virtual elements of immersive learning, the use of Google Expedition and touch-screen-based emotion are examined.
Pedagogical application, affordances, and cognitive load are the corresponding measures that are involved.
arXiv Detail & Related papers (2023-12-18T09:24:35Z) - MISAR: A Multimodal Instructional System with Augmented Reality [38.79160527414268]
Augmented reality (AR) requires seamless integration of visual, auditory, and linguistic channels for optimized human-computer interaction.
Our study introduces an innovative method harnessing large language models (LLMs) to assimilate information from visual, auditory, and contextual modalities.
arXiv Detail & Related papers (2023-10-18T04:15:12Z) - Imagination-Augmented Natural Language Understanding [71.51687221130925]
We introduce an Imagination-Augmented Cross-modal (iACE) to solve natural language understanding tasks.
iACE enables visual imagination with external knowledge transferred from the powerful generative and pre-trained vision-and-language models.
Experiments on GLUE and SWAG show that iACE achieves consistent improvement over visually-supervised pre-trained models.
arXiv Detail & Related papers (2022-04-18T19:39:36Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Does Visual Self-Supervision Improve Learning of Speech Representations
for Emotion Recognition? [63.564385139097624]
This work investigates visual self-supervision via face reconstruction to guide the learning of audio representations.
We show that a multi-task combination of the proposed visual and audio self-supervision is beneficial for learning richer features.
We evaluate our learned audio representations for discrete emotion recognition, continuous affect recognition and automatic speech recognition.
arXiv Detail & Related papers (2020-05-04T11:33:40Z) - Visually Guided Self Supervised Learning of Speech Representations [62.23736312957182]
We propose a framework for learning audio representations guided by the visual modality in the context of audiovisual speech.
We employ a generative audio-to-video training scheme in which we animate a still image corresponding to a given audio clip and optimize the generated video to be as close as possible to the real video of the speech segment.
We achieve state of the art results for emotion recognition and competitive results for speech recognition.
arXiv Detail & Related papers (2020-01-13T14:53:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.