Related papers: psifx -- Psychological and Social Interactions Feature Extraction Package

Related papers

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection [51.52749744031413]
Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions.<n>Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues.<n>We propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics.
arXiv Detail & Related papers (2025-07-23T12:30:19Z)
TinyTroupe: An LLM-powered Multiagent Persona Simulation Toolkit [7.56072680903655]
We introduce TinyTroupe, a simulation toolkit enabling detailed persona definitions.<n>TinyTroupe's components are presented using representative working examples.<n>The library is available as open source at https://github.com/tinytroupe.
arXiv Detail & Related papers (2025-07-13T21:00:27Z)
Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z)
Adaptive Language-Guided Abstraction from Contrastive Explanations [53.48583372522492]
It is necessary to determine which features of the environment are relevant before determining how these features should be used to compute reward. End-to-end methods for joint feature and reward learning often yield brittle reward functions that are sensitive to spurious state features. This paper describes a method named ALGAE which alternates between using language models to iteratively identify human-meaningful features.
arXiv Detail & Related papers (2024-09-12T16:51:58Z)
Intelligent Interface: Enhancing Lecture Engagement with Didactic Activity Summaries [0.054204929130712134]
The prototype utilizes machine learning-based techniques to recognise selected didactic and behavioural teachers' features within lecture video recordings. The system offers flexibility for (future) integration of new/additional machine-learning models and software modules for image and video analysis.
arXiv Detail & Related papers (2024-06-20T12:45:23Z)
ChatHuman: Chatting about 3D Humans with Tools [57.29285473727107]
ChatHuman is a language-driven system that integrates the capabilities of specialized methods into a unified framework.<n>ChatHuman functions as an assistant proficient in utilizing, analyzing, and interacting with tools specific to 3D human tasks.
arXiv Detail & Related papers (2024-05-07T17:59:31Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
The AffectToolbox: Affect Analysis for Everyone [10.526991118781913]
AffectToolbox is a novel software system that aims to support researchers in developing affect-sensitive studies and prototypes. The proposed system addresses the challenges posed by existing frameworks, which often require profound programming knowledge and cater primarily to power-users or skilled developers. The architecture encompasses a variety of models for emotion recognition on multiple affective channels and modalities, as well as an elaborate fusion system to merge multi-modal assessments into a unified result.
arXiv Detail & Related papers (2024-02-23T08:55:47Z)
Supporting Experts with a Multimodal Machine-Learning-Based Tool for Human Behavior Analysis of Conversational Videos [40.30407535831779]
We developed Providence, a visual-programming-based tool based on design considerations derived from a formative study with experts. It enables experts to combine various machine learning algorithms to capture human behavioral cues without writing code. Our study showed its preferable usability and satisfactory output with less cognitive load imposed in accomplishing scene search tasks of conversations.
arXiv Detail & Related papers (2024-02-17T00:27:04Z)
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update [69.59482029810198]
CLOVA is a Closed-Loop Visual Assistant that operates within a framework encompassing inference, reflection, and learning phases. Results demonstrate that CLOVA surpasses existing tool-usage methods by 5% in visual question answering and multiple-image reasoning, by 10% in knowledge tagging, and by 20% in image editing.
arXiv Detail & Related papers (2023-12-18T03:34:07Z)
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection. We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z)
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures [51.78027546947034]
Recent advancements in surgical computer vision have been driven by vision-only models, which lack language semantics. We propose leveraging surgical video lectures from e-learning platforms to provide effective vision and language supervisory signals. We address surgery-specific linguistic challenges using multiple automatic speech recognition systems for text transcriptions.
arXiv Detail & Related papers (2023-07-27T22:38:12Z)
Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations. We autoregressively output multiple possibilities of corresponding listener motion. Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z)
Open-Source Tools for Behavioral Video Analysis: Setup, Methods, and Development [2.248500763940652]
Methods for video analysis are transforming behavioral quantification to be more precise, scalable, and reproducible. Open-source tools for video analysis have led to new experimental approaches to understand behavior. We review currently available open source tools for video analysis, how to set them up in a lab that is new to video recording methods, and some issues that should be addressed.
arXiv Detail & Related papers (2022-04-06T14:06:43Z)
Agents that Listen: High-Throughput Reinforcement Learning with Multiple Sensory Systems [6.952659395337689]
We introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.
arXiv Detail & Related papers (2021-07-05T18:00:50Z)
Py-Feat: Python Facial Expression Analysis Toolbox [0.0]
We introduce Py-Feat, an open-source Python toolbox that provides support for detecting, preprocessing, analyzing, and visualizing facial expression data. We hope this platform will facilitate increased use of facial expression data in human behavior research.
arXiv Detail & Related papers (2021-04-08T04:52:21Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
Visually Guided Self Supervised Learning of Speech Representations [62.23736312957182]
We propose a framework for learning audio representations guided by the visual modality in the context of audiovisual speech. We employ a generative audio-to-video training scheme in which we animate a still image corresponding to a given audio clip and optimize the generated video to be as close as possible to the real video of the speech segment. We achieve state of the art results for emotion recognition and competitive results for speech recognition.
arXiv Detail & Related papers (2020-01-13T14:53:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.