DBATES: DataBase of Audio features, Text, and visual Expressions in
competitive debate Speeches
- URL: http://arxiv.org/abs/2103.14189v1
- Date: Fri, 26 Mar 2021 00:43:49 GMT
- Title: DBATES: DataBase of Audio features, Text, and visual Expressions in
competitive debate Speeches
- Authors: Taylan K. Sen, Gazi Naven, Luke Gerstner, Daryl Bagley, Raiyan Abdul
Baten, Wasifur Rahman, Kamrul Hasan, Kurtis G. Haut, Abdullah Mamun, Samiha
Samrose, Anne Solbu, R. Eric Barnes, Mark G. Frank, Ehsan Hoque
- Abstract summary: We present a database of multimodal communication features extracted from debate speeches in the 2019 North American Universities Debate Championships (NAUDC)
Feature sets were extracted from the visual (facial expression, gaze, and head pose), audio (PRAAT), and textual (word sentiment and linguistic category) modalities.
We observe the fully multimodal model performs best in comparison to models trained on various compositions of modalities.
- Score: 2.5347738801524775
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we present a database of multimodal communication features
extracted from debate speeches in the 2019 North American Universities Debate
Championships (NAUDC). Feature sets were extracted from the visual (facial
expression, gaze, and head pose), audio (PRAAT), and textual (word sentiment
and linguistic category) modalities of raw video recordings of competitive
collegiate debaters (N=717 6-minute recordings from 140 unique debaters). Each
speech has an associated competition debate score (range: 67-96) from expert
judges as well as competitor demographic and per-round reflection surveys. We
observe the fully multimodal model performs best in comparison to models
trained on various compositions of modalities. We also find that the weights of
some features (such as the expression of joy and the use of the word we) change
in direction between the aforementioned models. We use these results to
highlight the value of a multimodal dataset for studying competitive,
collegiate debate.
Related papers
- DebateBench: A Challenging Long Context Reasoning Benchmark For Large Language Models [1.8197265299982013]
We introduce DebateBench, a novel dataset consisting of an extensive collection of transcripts and metadata from some of the world's most prestigious competitive debates.
The dataset consists of British Parliamentary debates from prestigious debating tournaments on diverse topics, annotated with detailed speech-level scores and house rankings sourced from official adjudication data.
We curate 256 speeches across 32 debates with each debate being over 1 hour long with each input being an average of 32,000 tokens.
arXiv Detail & Related papers (2025-02-10T09:23:03Z) - Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech [107.81472531864195]
Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions.
We present Dynamic-SUPERB, a benchmark for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion.
arXiv Detail & Related papers (2023-09-18T06:43:30Z) - Are words equally surprising in audio and audio-visual comprehension? [13.914373331208774]
We compare the ERP signature (N400) associated with each word in audio-only and audio-visual presentations of the same verbal stimuli.
Our results indicate that cognitive effort differs significantly between multimodal and unimodal settings.
This highlights the significant impact of local lexical context on cognitive processing in a multimodal environment.
arXiv Detail & Related papers (2023-07-14T11:17:37Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - Affective Faces for Goal-Driven Dyadic Communication [16.72177738101024]
We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation.
Our approach retrieves a video of a listener, who has facial expressions that would be socially appropriate given the context.
arXiv Detail & Related papers (2023-01-26T05:00:09Z) - Explaining Image Classification with Visual Debates [26.76139301708958]
We propose a novel debate framework for understanding and explaining a continuous image classifier's reasoning for making a particular prediction.
Our framework encourages players to put forward diverse arguments during the debates, picking up the reasoning trails missed by their opponents.
We demonstrate and evaluate (a practical realization) our Visual Debates on the geometric SHAPE and MNIST datasets.
arXiv Detail & Related papers (2022-10-17T12:35:52Z) - The Ability of Self-Supervised Speech Models for Audio Representations [53.19715501273934]
Self-supervised learning (SSL) speech models have achieved unprecedented success in speech representation learning.
We conduct extensive experiments on abundant speech and non-speech audio datasets to evaluate the representation ability of state-of-the-art SSL speech models.
Results show that SSL speech models could extract meaningful features of a wide range of non-speech audio, while they may also fail on certain types of datasets.
arXiv Detail & Related papers (2022-09-26T15:21:06Z) - DebateSum: A large-scale argument mining and summarization dataset [0.0]
DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries.
We train several transformer summarization models to benchmark summarization performance on DebateSum.
We present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association.
arXiv Detail & Related papers (2020-11-14T10:06:57Z) - Filling the Gap of Utterance-aware and Speaker-aware Representation for
Multi-turn Dialogue [76.88174667929665]
A multi-turn dialogue is composed of multiple utterances from two or more different speaker roles.
In the existing retrieval-based multi-turn dialogue modeling, the pre-trained language models (PrLMs) as encoder represent the dialogues coarsely.
We propose a novel model to fill such a gap by modeling the effective utterance-aware and speaker-aware representations entailed in a dialogue history.
arXiv Detail & Related papers (2020-09-14T15:07:19Z) - "Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding.
Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate.
We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z) - KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn
Knowledge-driven Conversation [66.99734491847076]
We propose a Chinese multi-domain knowledge-driven conversation dataset, KdConv, which grounds the topics in multi-turn conversations to knowledge graphs.
Our corpus contains 4.5K conversations from three domains (film, music, and travel), and 86K utterances with an average turn number of 19.0.
arXiv Detail & Related papers (2020-04-08T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.