Fairness in Rating Prediction by Awareness of Verbal and Gesture Quality
of Public Speeches
- URL: http://arxiv.org/abs/2012.06157v2
- Date: Wed, 16 Dec 2020 20:48:35 GMT
- Title: Fairness in Rating Prediction by Awareness of Verbal and Gesture Quality
of Public Speeches
- Authors: Rupam Acharyya, Ankani Chattoraj, Shouman Das, Md. Iftekhar Tanveer,
Ehsan Hoque
- Abstract summary: We formalize a novel HEterogeneity Metric, HEM, that quantifies the quality of a talk both in the verbal and non-verbal domain.
We show that there is an interesting relationship between HEM and the ratings of TED talks given to speakers by viewers.
We incorporate the HEM metric into the loss function of a neural network with the goal to reduce unfairness in rating predictions with respect to race and gender.
- Score: 5.729787815551408
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The role of verbal and non-verbal cues towards great public speaking has been
a topic of exploration for many decades. We identify a commonality across
present theories, the element of "variety or heterogeneity" in channels or
modes of communication (e.g. resorting to stories, scientific facts, emotional
connections, facial expressions etc.) which is essential for effectively
communicating information. We use this observation to formalize a novel
HEterogeneity Metric, HEM, that quantifies the quality of a talk both in the
verbal and non-verbal domain (transcript and facial gestures). We use TED talks
as an input repository of public speeches because it consists of speakers from
a diverse community besides having a wide outreach. We show that there is an
interesting relationship between HEM and the ratings of TED talks given to
speakers by viewers. It emphasizes that HEM inherently and successfully
represents the quality of a talk based on "variety or heterogeneity". Further,
we also discover that HEM successfully captures the prevalent bias in ratings
with respect to race and gender, that we call sensitive attributes (because
prediction based on these might result in unfair outcome). We incorporate the
HEM metric into the loss function of a neural network with the goal to reduce
unfairness in rating predictions with respect to race and gender. Our results
show that the modified loss function improves fairness in prediction without
considerably affecting prediction accuracy of the neural network. Our work ties
together a novel metric for public speeches in both verbal and non-verbal
domain with the computational power of a neural network to design a fair
prediction system for speakers.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Effect of Attention and Self-Supervised Speech Embeddings on
Non-Semantic Speech Tasks [3.570593982494095]
We look at speech emotion understanding as a perception task which is a more realistic setting.
We leverage ComParE rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion.
Our results show that HuBERT-Large with a self-attention-based light-weight sequence model provides 4.6% improvement over the reported baseline.
arXiv Detail & Related papers (2023-08-28T07:11:27Z) - Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning [70.30713251031052]
We propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.
Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech.
arXiv Detail & Related papers (2022-06-15T01:25:32Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Probing Speech Emotion Recognition Transformers for Linguistic Knowledge [7.81884995637243]
We investigate the extent in which linguistic information is exploited during speech emotion recognition fine-tuning.
We synthesise prosodically neutral speech utterances while varying the sentiment of the text.
Valence predictions of the transformer model are very reactive to positive and negative sentiment content, as well as negations, but not to intensifiers or reducers.
arXiv Detail & Related papers (2022-04-01T12:47:45Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Calibrate your listeners! Robust communication-based training for
pragmatic speakers [30.731870275051957]
We propose a method that uses a population of neural listeners to regularize speaker training.
We show that language drift originates from the poor uncertainty calibration of a neural listener.
We evaluate both population-based objectives on reference games, and show that the ensemble method with better calibration enables the speaker to generate pragmatic utterances.
arXiv Detail & Related papers (2021-10-11T17:07:38Z) - FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition [0.015863809575305417]
We introduce FSER, a speech emotion recognition model trained on four valid speech databases.
On each benchmark dataset, FSER outperforms the best models introduced so far, achieving a state-of-the-art performance.
FSER could potentially be used to improve mental and emotional health care.
arXiv Detail & Related papers (2021-09-15T05:03:24Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - "Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding.
Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate.
We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.