UpStory: the Uppsala Storytelling dataset
- URL: http://arxiv.org/abs/2407.04352v1
- Date: Fri, 5 Jul 2024 08:46:16 GMT
- Title: UpStory: the Uppsala Storytelling dataset
- Authors: Marc Fraile, Natalia Calvo-Barajas, Anastasia Sophia Apeiron, Giovanna Varni, Joakim Lindblad, NataĊĦa Sladoje, Ginevra Castellano,
- Abstract summary: UpStory is a novel dataset of naturalistic dyadic interactions between primary school aged children.
The dataset contains data for 35 pairs, totalling 3h 40m of audio and video recordings.
An anonymized version of the dataset is made publicly available, containing per-frame head pose, body pose, and face features.
- Score: 2.7895834501191823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Friendship and rapport play an important role in the formation of constructive social interactions, and have been widely studied in educational settings due to their impact on student outcomes. Given the growing interest in automating the analysis of such phenomena through Machine Learning (ML), access to annotated interaction datasets is highly valuable. However, no dataset on dyadic child-child interactions explicitly capturing rapport currently exists. Moreover, despite advances in the automatic analysis of human behaviour, no previous work has addressed the prediction of rapport in child-child dyadic interactions in educational settings. We present UpStory -- the Uppsala Storytelling dataset: a novel dataset of naturalistic dyadic interactions between primary school aged children, with an experimental manipulation of rapport. Pairs of children aged 8-10 participate in a task-oriented activity: designing a story together, while being allowed free movement within the play area. We promote balanced collection of different levels of rapport by using a within-subjects design: self-reported friendships are used to pair each child twice, either minimizing or maximizing pair separation in the friendship network. The dataset contains data for 35 pairs, totalling 3h 40m of audio and video recordings. It includes two video sources covering the play area, as well as separate voice recordings for each child. An anonymized version of the dataset is made publicly available, containing per-frame head pose, body pose, and face features; as well as per-pair information, including the level of rapport. Finally, we provide ML baselines for the prediction of rapport.
Related papers
- Holistic Understanding of 3D Scenes as Universal Scene Description [56.69740649781989]
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI.
We introduce an expertly curated dataset in the Universal Scene Description (USD) format featuring high-quality manual annotations.
With its broad and high-quality annotations, the data provides the basis for holistic 3D scene understanding models.
arXiv Detail & Related papers (2024-12-02T11:33:55Z) - Towards Student Actions in Classroom Scenes: New Dataset and Baseline [43.268586725768465]
We present a new multi-label student action video (SAV) dataset for complex classroom scenes.
The dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, each labeled with 15 different actions displayed by students in classrooms.
arXiv Detail & Related papers (2024-09-02T03:44:24Z) - InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios [12.300105542672163]
We capture 241 motion sequences where two persons perform a realistic scenario over the whole sequence.
The audios, body motions, and facial expressions of both persons are all captured in our dataset.
We also demonstrate the first diffusion model based approach that directly estimates the interactive motions between two persons from their audios alone.
arXiv Detail & Related papers (2024-05-19T22:35:02Z) - Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend
Existing Ones? [57.16050211534735]
We investigate the types and frequency of free-text human feedback in commonly used dialog datasets.
Our findings provide new insights into the composition of the datasets examined, including error types, user response types, and the relations between them.
arXiv Detail & Related papers (2023-10-24T12:01:11Z) - Action Class Relation Detection and Classification Across Multiple Video
Datasets [1.15520000056402]
We consider two new machine learning tasks: action class relation detection and classification.
We propose a unified model to predict relations between action classes, using language and visual information associated with classes.
Experimental results show that (i) pre-trained recent neural network models for texts and videos contribute to high predictive performance, (ii) the relation prediction based on action label texts is more accurate than based on videos, and (iii) a blending approach can further improve the predictive performance in some cases.
arXiv Detail & Related papers (2023-08-15T03:56:46Z) - BEHAVE: Dataset and Method for Tracking Human Object Interactions [105.77368488612704]
We present the first full body human- object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them.
We use this data to learn a model that can jointly track humans and objects in natural environments with an easy-to-use portable multi-camera setup.
arXiv Detail & Related papers (2022-04-14T13:21:19Z) - JRDB-Act: A Large-scale Multi-modal Dataset for Spatio-temporal Action,
Social Group and Activity Detection [54.696819174421584]
We introduce JRDB-Act, a multi-modal dataset that reflects a real distribution of human daily life actions in a university campus environment.
JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels.
JRDB-Act comes with social group identification annotations conducive to the task of grouping individuals based on their interactions in the scene.
arXiv Detail & Related papers (2021-06-16T14:43:46Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z) - RKT : Relation-Aware Self-Attention for Knowledge Tracing [2.9778695679660188]
We propose a novel Relation-aware self-attention model for Knowledge Tracing (RKT)
We introduce a relation-aware self-attention layer that incorporates the contextual information.
Our model outperforms state-of-the-art knowledge tracing methods.
arXiv Detail & Related papers (2020-08-28T16:47:03Z) - Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child
Multimodal Interaction Dataset [29.858195646762297]
We design end-to-end deep learning methods to recognize each person's affective expression in an audio stream with two speakers.
Our results show that the proposed weighted-pooling attention solutions are able to learn to focus on the regions containing target speaker's affective information.
arXiv Detail & Related papers (2020-08-20T20:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.