ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social
Interactions In-the-Wild
- URL: http://arxiv.org/abs/2205.05177v1
- Date: Tue, 10 May 2022 21:30:10 GMT
- Title: ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social
Interactions In-the-Wild
- Authors: Chirag Raman, Jose Vargas-Quiros, Stephanie Tan, Ekin Gedik, Ashraful
Islam, Hayley Hung
- Abstract summary: We describe an instantiation of a new concept for multimodal multisensor data collection of real life in-the-wild free standing social interactions.
ConfLab contains high fidelity data of 49 people during a real-life professional networking event.
- Score: 10.686716372324096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe an instantiation of a new concept for multimodal multisensor data
collection of real life in-the-wild free standing social interactions in the
form of a Conference Living Lab (ConfLab). ConfLab contains high fidelity data
of 49 people during a real-life professional networking event capturing a
diverse mix of status, acquaintanceship, and networking motivations at an
international conference. Recording such a dataset is challenging due to the
delicate trade-off between participant privacy and fidelity of the data, and
the technical and logistic challenges involved. We improve upon prior datasets
in the fidelity of most of our modalities: 8-camera overhead setup, personal
wearable sensors recording body motion (9-axis IMU), Bluetooth-based proximity,
and low-frequency audio. Additionally, we use a state-of-the-art hardware
synchronization solution and time-efficient continuous technique for annotating
body keypoints and actions at high frequencies. We argue that our improvements
are essential for a deeper study of interaction dynamics at finer time scales.
Our research tasks showcase some of the open challenges related to in-the-wild
privacy-preserving social data analysis: keypoints detection from overhead
camera views, skeleton based no-audio speaker detection, and F-formation
detection. With the ConfLab dataset, we aim to bridge the gap between
traditional computer vision tasks and in-the-wild ecologically valid
socially-motivated tasks.
Related papers
- RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments [62.5830455357187]
We setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye)
A large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception.
arXiv Detail & Related papers (2024-08-28T03:17:40Z) - CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments [8.177157078744571]
This paper presents a pioneering and comprehensive real-world multi-robot collaborative perception dataset.
It features raw sensor inputs, pose estimation, and optional high-level perception annotation.
We believe this work will unlock the potential research of high-level scene understanding through multi-modal collaborative perception in multi-robot settings.
arXiv Detail & Related papers (2024-05-23T15:59:48Z) - Double Mixture: Towards Continual Event Detection from Speech [60.33088725100812]
Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events.
This paper tackles two primary challenges in speech event detection: the continual integration of new events without forgetting previous ones, and the disentanglement of semantic from acoustic events.
We propose a novel method, 'Double Mixture,' which merges speech expertise with robust memory mechanisms to enhance adaptability and prevent forgetting.
arXiv Detail & Related papers (2024-04-20T06:32:00Z) - NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant
Meeting Transcription [21.236634241186458]
We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR-1'') Challenge alongside datasets and baseline system.
The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios.
arXiv Detail & Related papers (2024-01-16T23:50:26Z) - Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection.
We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition [45.0131792009999]
We propose a point cloud-based network named Two-stream Multi-level Dynamic Point Transformer for two-person interaction recognition.
Our model addresses the challenge of recognizing two-person interactions by incorporating local-region spatial information, appearance information, and motion information.
Our network outperforms state-of-the-art approaches in most standard evaluation settings.
arXiv Detail & Related papers (2023-07-22T03:51:32Z) - Contactless Human Activity Recognition using Deep Learning with Flexible
and Scalable Software Define Radio [1.3106429146573144]
This study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sensing.
These methods avoid additional costly hardware required for vision-based systems, which are privacy-intrusive.
This study presents a Wi-Fi CSI-based HAR system that assesses and contrasts deep learning approaches.
arXiv Detail & Related papers (2023-04-18T10:20:14Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - A Wireless-Vision Dataset for Privacy Preserving Human Activity
Recognition [53.41825941088989]
A new WiFi-based and video-based neural network (WiNN) is proposed to improve the robustness of activity recognition.
Our results show that WiVi data set satisfies the primary demand and all three branches in the proposed pipeline keep more than $80%$ of activity recognition accuracy.
arXiv Detail & Related papers (2022-05-24T10:49:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.