Related papers: A Real-Time Gesture-Based Control Framework

A Real-Time Gesture-Based Control Framework

URL: http://arxiv.org/abs/2504.19460v1
Date: Mon, 28 Apr 2025 03:57:28 GMT
Title: A Real-Time Gesture-Based Control Framework
Authors: Mahya Khazaei, Ali Bahrani, George Tzanetakis,
Abstract summary: We introduce a real-time, human-in-the-loop gesture control framework.<n>It can dynamically adapt audio and music based on human movement.<n>System is designed for live performances, interactive installations, and personal use.
Score: 2.432598153985671
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce a real-time, human-in-the-loop gesture control framework that can dynamically adapt audio and music based on human movement by analyzing live video input. By creating a responsive connection between visual and auditory stimuli, this system enables dancers and performers to not only respond to music but also influence it through their movements. Designed for live performances, interactive installations, and personal use, it offers an immersive experience where users can shape the music in real time. The framework integrates computer vision and machine learning techniques to track and interpret motion, allowing users to manipulate audio elements such as tempo, pitch, effects, and playback sequence. With ongoing training, it achieves user-independent functionality, requiring as few as 50 to 80 samples to label simple gestures. This framework combines gesture training, cue mapping, and audio manipulation to create a dynamic, interactive experience. Gestures are interpreted as input signals, mapped to sound control commands, and used to naturally adjust music elements, showcasing the seamless interplay between human interaction and machine response.

Related papers

X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image.<n>It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z)
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content. MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features. We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z)
Creativity and Visual Communication from Machine to Musician: Sharing a Score through a Robotic Camera [4.9485163144728235]
This paper explores the integration of visual communication and musical interaction by implementing a robotic camera within a "Guided Harmony" musical game. The robotic system interprets and responds to nonverbal cues from musicians, creating a collaborative and adaptive musical experience.
arXiv Detail & Related papers (2024-09-09T16:34:36Z)
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis [50.69464138626748]
We present ConvoFusion, a diffusion-based approach for multi-modal gesture synthesis. Our method proposes two guidance objectives that allow the users to modulate the impact of different conditioning modalities. Our method is versatile in that it can be trained either for generating monologue gestures or even the conversational gestures.
arXiv Detail & Related papers (2024-03-26T17:59:52Z)
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation [9.741109135330262]
Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion. We propose a user-friendly framework called CustomListener to realize the free-form text prior guided listener generation.
arXiv Detail & Related papers (2024-03-01T04:31:56Z)
Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings. We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters. Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z)
Flat latent manifolds for music improvisation between human and machine [9.571383193449648]
We consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal improvisation is to lead to new experiences. In the learned model, we generate novel musical sequences by quantification in latent space. We provide empirical evidence for our method via a set of experiments on music and we deploy our model for an interactive jam session with a professional drummer.
arXiv Detail & Related papers (2022-02-23T09:00:17Z)
A Human-Computer Duet System for Music Performance [7.777761975348974]
We create a virtual violinist who can collaborate with a human pianist to perform chamber music automatically without any intervention. The system incorporates the techniques from various fields, including real-time music tracking, pose estimation, and body movement generation. The proposed system has been validated in public concerts.
arXiv Detail & Related papers (2020-09-16T17:19:23Z)
Learning to Generate Diverse Dance Motions with Transformer [67.43270523386185]
We introduce a complete system for dance motion synthesis. A massive dance motion data set is created from YouTube videos. A novel two-stream motion transformer generative model can generate motion sequences with high flexibility.
arXiv Detail & Related papers (2020-08-18T22:29:40Z)
Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music. We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.