psifx -- Psychological and Social Interactions Feature Extraction Package
- URL: http://arxiv.org/abs/2407.10266v2
- Date: Tue, 16 Jul 2024 09:30:03 GMT
- Title: psifx -- Psychological and Social Interactions Feature Extraction Package
- Authors: Guillaume Rochette, Matthew J. Vowels,
- Abstract summary: psifx is a plug-and-play multi-modal feature extraction toolkit.
It aims to facilitate and democratize the use of state-of-the-art machine learning techniques for human sciences research.
- Score: 3.560429497877327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: psifx is a plug-and-play multi-modal feature extraction toolkit, aiming to facilitate and democratize the use of state-of-the-art machine learning techniques for human sciences research. It is motivated by a need (a) to automate and standardize data annotation processes, otherwise involving expensive, lengthy, and inconsistent human labor, such as the transcription or coding of behavior changes from audio and video sources; (b) to develop and distribute open-source community-driven psychology research software; and (c) to enable large-scale access and ease of use to non-expert users. The framework contains an array of tools for tasks, such as speaker diarization, closed-caption transcription and translation from audio, as well as body, hand, and facial pose estimation and gaze tracking from video. The package has been designed with a modular and task-oriented approach, enabling the community to add or update new tools easily. We strongly hope that this package will provide psychologists a simple and practical solution for efficiently a range of audio, linguistic, and visual features from audio and video, thereby creating new opportunities for in-depth study of real-time behavioral phenomena.
Related papers
- Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward.
End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features.
This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z) - Intelligent Interface: Enhancing Lecture Engagement with Didactic Activity Summaries [0.054204929130712134]
The prototype utilizes machine learning-based techniques to recognise selected didactic and behavioural teachers' features within lecture video recordings.
The system offers flexibility for (future) integration of new/additional machine-learning models and software modules for image and video analysis.
arXiv Detail & Related papers (2024-06-20T12:45:23Z) - Supporting Experts with a Multimodal Machine-Learning-Based Tool for
Human Behavior Analysis of Conversational Videos [40.30407535831779]
We developed Providence, a visual-programming-based tool based on design considerations derived from a formative study with experts.
It enables experts to combine various machine learning algorithms to capture human behavioral cues without writing code.
Our study showed its preferable usability and satisfactory output with less cognitive load imposed in accomplishing scene search tasks of conversations.
arXiv Detail & Related papers (2024-02-17T00:27:04Z) - CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update [69.59482029810198]
CLOVA is a Closed-Loop Visual Assistant that operates within a framework encompassing inference, reflection, and learning phases.
Results demonstrate that CLOVA surpasses existing tool-usage methods by 5% in visual question answering and multiple-image reasoning, by 10% in knowledge tagging, and by 20% in image editing.
arXiv Detail & Related papers (2023-12-18T03:34:07Z) - Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures [51.78027546947034]
Recent advancements in surgical computer vision have been driven by vision-only models, which lack language semantics.
We propose leveraging surgical video lectures from e-learning platforms to provide effective vision and language supervisory signals.
We address surgery-specific linguistic challenges using multiple automatic speech recognition systems for text transcriptions.
arXiv Detail & Related papers (2023-07-27T22:38:12Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - Open-Source Tools for Behavioral Video Analysis: Setup, Methods, and
Development [2.248500763940652]
Methods for video analysis are transforming behavioral quantification to be more precise, scalable, and reproducible.
Open-source tools for video analysis have led to new experimental approaches to understand behavior.
We review currently available open source tools for video analysis, how to set them up in a lab that is new to video recording methods, and some issues that should be addressed.
arXiv Detail & Related papers (2022-04-06T14:06:43Z) - Agents that Listen: High-Throughput Reinforcement Learning with Multiple
Sensory Systems [6.952659395337689]
We introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations.
We train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.
arXiv Detail & Related papers (2021-07-05T18:00:50Z) - Py-Feat: Python Facial Expression Analysis Toolbox [0.0]
We introduce Py-Feat, an open-source Python toolbox that provides support for detecting, preprocessing, analyzing, and visualizing facial expression data.
We hope this platform will facilitate increased use of facial expression data in human behavior research.
arXiv Detail & Related papers (2021-04-08T04:52:21Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Visually Guided Self Supervised Learning of Speech Representations [62.23736312957182]
We propose a framework for learning audio representations guided by the visual modality in the context of audiovisual speech.
We employ a generative audio-to-video training scheme in which we animate a still image corresponding to a given audio clip and optimize the generated video to be as close as possible to the real video of the speech segment.
We achieve state of the art results for emotion recognition and competitive results for speech recognition.
arXiv Detail & Related papers (2020-01-13T14:53:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.