A Multimodal Machine Learning Framework for Teacher Vocal Delivery
Evaluation
- URL: http://arxiv.org/abs/2107.07956v1
- Date: Thu, 15 Jul 2021 05:09:39 GMT
- Title: A Multimodal Machine Learning Framework for Teacher Vocal Delivery
Evaluation
- Authors: Hang Li, Yu Kang, Yang Hao, Wenbiao Ding, Zhongqin Wu, Zitao Liu
- Abstract summary: We present a novel machine learning approach that utilizes pairwise comparisons and a multimodal fusing algorithm to generate objective evaluation results of the teacher vocal delivery in terms of fluency and passion.
- Score: 21.07429789279818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The quality of vocal delivery is one of the key indicators for evaluating
teacher enthusiasm, which has been widely accepted to be connected to the
overall course qualities. However, existing evaluation for vocal delivery is
mainly conducted with manual ratings, which faces two core challenges:
subjectivity and time-consuming. In this paper, we present a novel machine
learning approach that utilizes pairwise comparisons and a multimodal
orthogonal fusing algorithm to generate large-scale objective evaluation
results of the teacher vocal delivery in terms of fluency and passion. We
collect two datasets from real-world education scenarios and the experiment
results demonstrate the effectiveness of our algorithm. To encourage
reproducible results, we make our code public available at
\url{https://github.com/tal-ai/ML4VocalDelivery.git}.
Related papers
- Narrative Action Evaluation with Prompt-Guided Multimodal Interaction [60.281405999483]
Narrative action evaluation (NAE) aims to generate professional commentary that evaluates the execution of an action.
NAE is a more challenging task because it requires both narrative flexibility and evaluation rigor.
We propose a prompt-guided multimodal interaction framework to facilitate the interaction between different modalities of information.
arXiv Detail & Related papers (2024-04-22T17:55:07Z) - Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder [22.836016610542387]
This paper introduces a novel framework within an unsupervised setting for learning voice-face associations.
By employing a multimodal encoder after contrastive learning and addressing the problem through binary classification, we can learn the implicit information within the embeddings in a more effective and varied manner.
Empirical evidence demonstrates that our framework achieves state-of-the-art results in voice-face matching, verification, and retrieval tasks.
arXiv Detail & Related papers (2024-04-15T07:05:14Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Unity is Strength: Cross-Task Knowledge Distillation to Improve Code
Review Generation [0.9208007322096533]
We propose a novel deep-learning architecture, DISCOREV, based on cross-task knowledge distillation.
In our approach, the fine-tuning of the comment generation model is guided by the code refinement model.
Our results show that our approach generates better review comments as measured by the BLEU score.
arXiv Detail & Related papers (2023-09-06T21:10:33Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Cross-modal Audio-visual Co-learning for Text-independent Speaker
Verification [55.624946113550195]
This paper proposes a cross-modal speech co-learning paradigm.
Two cross-modal boosters are introduced based on an audio-visual pseudo-siamese structure to learn the modality-transformed correlation.
Experimental results on the LRSLip3, GridLip, LomGridLip, and VoxLip datasets demonstrate that our proposed method achieves 60% and 20% average relative performance improvement.
arXiv Detail & Related papers (2023-02-22T10:06:37Z) - Audio Representation Learning by Distilling Video as Privileged
Information [25.71206255965502]
We propose a novel approach for deep audio representation learning using audio-visual data when the video modality is absent at inference.
We adopt teacher-student knowledge distillation under the framework of learning using privileged information (LUPI)
We show considerable improvements over sole audio-based recognition as well as prior works that use LUPI.
arXiv Detail & Related papers (2023-02-06T15:09:34Z) - Mixtures of Deep Neural Experts for Automated Speech Scoring [11.860560781894458]
The paper copes with the task of automatic assessment of second language proficiency from the language learners' spoken responses to test prompts.
The approach relies on two separate modules: (1) an automatic speech recognition system that yields text transcripts of the spoken interactions involved, and (2) a multiple classifier system based on deep learners that ranks the transcripts into proficiency classes.
arXiv Detail & Related papers (2021-06-23T15:44:50Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Noisy Self-Knowledge Distillation for Text Summarization [83.49809205891496]
We apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training.
Our student summarization model is trained with guidance from a teacher which generates smoothed labels to help regularize training.
We demonstrate experimentally on three benchmarks that our framework boosts the performance of both pretrained and non-pretrained summarizers.
arXiv Detail & Related papers (2020-09-15T12:53:09Z) - Key Phrase Classification in Complex Assignments [5.067828201066184]
We show that the task of classification of key phrases is ambiguous at a human level producing Cohen's kappa of 0.77 on a new data set.
Both pretrained language models and simple TFIDF SVM classifiers produce similar results with a former producing average of 0.6 F1 higher than the latter.
arXiv Detail & Related papers (2020-03-16T04:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.