Related papers: Supporting Experts with a Multimodal Machine-Learning-Based Tool for Human Behavior Analysis of Conversational Videos

Supporting Experts with a Multimodal Machine-Learning-Based Tool for Human Behavior Analysis of Conversational Videos

URL: http://arxiv.org/abs/2402.11145v1
Date: Sat, 17 Feb 2024 00:27:04 GMT
Title: Supporting Experts with a Multimodal Machine-Learning-Based Tool for Human Behavior Analysis of Conversational Videos
Authors: Riku Arakawa and Kiyosu Maeda and Hiromu Yakura
Abstract summary: We developed Providence, a visual-programming-based tool based on design considerations derived from a formative study with experts. It enables experts to combine various machine learning algorithms to capture human behavioral cues without writing code. Our study showed its preferable usability and satisfactory output with less cognitive load imposed in accomplishing scene search tasks of conversations.
Score: 40.30407535831779
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal scene search of conversations is essential for unlocking valuable insights into social dynamics and enhancing our communication. While experts in conversational analysis have their own knowledge and skills to find key scenes, a lack of comprehensive, user-friendly tools that streamline the processing of diverse multimodal queries impedes efficiency and objectivity. To solve it, we developed Providence, a visual-programming-based tool based on design considerations derived from a formative study with experts. It enables experts to combine various machine learning algorithms to capture human behavioral cues without writing code. Our study showed its preferable usability and satisfactory output with less cognitive load imposed in accomplishing scene search tasks of conversations, verifying the importance of its customizability and transparency. Furthermore, through the in-the-wild trial, we confirmed the objectivity and reusability of the tool transform experts' workflow, suggesting the advantage of expert-AI teaming in a highly human-contextual domain.

Related papers

Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models [17.006423792670414]
We present the Data Therapist, a web-based tool that helps domain experts externalize implicit knowledge through a mixed-initiative process. The resulting structured knowledge base can inform both human and automated visualization design.
arXiv Detail & Related papers (2025-05-01T11:10:17Z)
InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions [22.007942964950217]
We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs. This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses.
arXiv Detail & Related papers (2025-03-06T05:35:19Z)
PromptHive: Bringing Subject Matter Experts Back to the Forefront with Collaborative Prompt Engineering for Educational Content Creation [8.313693615194309]
In this work, we introduce PromptHive, a collaborative interface for prompt authoring, designed to better connect domain knowledge with prompt engineering. We conducted an evaluation study with ten subject matter experts in math and validated our design through two collaborative prompt-writing sessions and a learning gain study with 358 learners. Our results elucidate the prompt iteration process and validate the tool's usability, enabling non-AI experts to craft prompts that generate content comparable to human-authored materials.
arXiv Detail & Related papers (2024-10-21T22:18:24Z)
Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM Agents [29.73055078727462]
Vital Insight is a novel, LLM-assisted, prototype system to enable human-in-the-loop inference (sensemaking) and visualizations of multi-modal passive sensing data from smartphones and wearables. We observe experts' interactions with it and develop an expert sensemaking model that explains how experts move between direct data representations and AI-supported inferences.
arXiv Detail & Related papers (2024-10-18T21:56:35Z)
Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z)
Towards Detecting and Mitigating Cognitive Bias in Spoken Conversational Search [14.916529791823868]
This paper draws upon insights from information seeking, psychology, cognitive science, and wearable sensors to provoke novel conversations in the community. We propose a framework including multimodal instruments and methods for experimental designs and settings.
arXiv Detail & Related papers (2024-05-21T03:50:32Z)
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection. We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z)
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task. We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z)
An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper [24.586485898038312]
In this paper, we focus on the issues of evaluating visual text analytics approaches. We identify four key groups of challenges for evaluating visual text analytics approaches.
arXiv Detail & Related papers (2022-09-23T11:47:37Z)
Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance. This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings. Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z)
Estimating Presentation Competence using Multimodal Nonverbal Behavioral Cues [7.340483819263093]
Public speaking and presentation competence plays an essential role in many areas of social interaction. One approach that can promote efficient development of presentation competence is the automated analysis of human behavior during a speech. In this work, we investigate the contribution of different nonverbal behavioral cues, namely, facial, body pose-based, and audio-related features, to estimate presentation competence.
arXiv Detail & Related papers (2021-05-06T13:09:41Z)
On Interactive Machine Learning and the Potential of Cognitive Feedback [2.320417845168326]
We introduce interactive machine learning and explain its advantages and limitations within the context of defense applications. We define the three techniques by which cognitive feedback may be employed: self reporting, implicit cognitive feedback, and modeled cognitive feedback.
arXiv Detail & Related papers (2020-03-23T16:28:14Z)
A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning [60.335974351919816]
Object perception is a fundamental sub-field of Computer Vision. Recent works seek ways to integrate knowledge engineering in order to expand the level of intelligence of the visual interpretation of objects.
arXiv Detail & Related papers (2019-12-26T13:26:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.