Aggression in Hindi and English Speech: Acoustic Correlates and
Automatic Identification
- URL: http://arxiv.org/abs/2204.02814v1
- Date: Wed, 6 Apr 2022 13:29:25 GMT
- Title: Aggression in Hindi and English Speech: Acoustic Correlates and
Automatic Identification
- Authors: Ritesh Kumar, Atul Kr. Ojha, Bornini Lahiri, Chingrimnng Lungleng
- Abstract summary: The study is based on a corpus of slightly over 10 hours of political discourse.
We develop two automatic classification systems for identifying aggression in English and Hindi speech.
- Score: 0.802904964931021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the present paper, we will present the results of an acoustic analysis of
political discourse in Hindi and discuss some of the conventionalised acoustic
features of aggressive speech regularly employed by the speakers of Hindi and
English. The study is based on a corpus of slightly over 10 hours of political
discourse and includes debates on news channel and political speeches. Using
this study, we develop two automatic classification systems for identifying
aggression in English and Hindi speech, based solely on an acoustic model. The
Hindi classifier, trained using 50 hours of annotated speech, and English
classifier, trained using 40 hours of annotated speech, achieve a respectable
accuracy of over 73% and 66% respectively. In this paper, we discuss the
development of this annotated dataset, the experiments for developing the
classifier and discuss the errors that it makes.
Related papers
- On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation [88.77441715819366]
Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content.<n>We propose a variety of likelihood- and generative-based evaluation methods that serve in place of naive global token perplexity.
arXiv Detail & Related papers (2026-01-09T22:01:56Z) - Language-agnostic, automated assessment of listeners' speech recall using large language models [0.0]
This research leverages modern large language models (LLMs) in native English speakers and native speakers of 10 other languages.
Participants listened to and freely recalled short stories (in quiet/clear and in babble noise) in their native language.
LLMs prompt engineering with semantic similarity analyses to score speech recall revealed sensitivity to known effects of temporal order, primacy/recency, and background noise.
arXiv Detail & Related papers (2025-03-02T22:28:41Z) - EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation [83.29199726650899]
The EARS dataset comprises 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data.
The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech.
We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics.
arXiv Detail & Related papers (2024-06-10T11:28:29Z) - Annotated Speech Corpus for Low Resource Indian Languages: Awadhi,
Bhojpuri, Braj and Magahi [2.84214511742034]
We develop a speech corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and Magahi.
The total size of the corpus currently stands at approximately 18 hours.
We discuss our methodology for data collection in these languages, most of which was done in the middle of the COVID-19 pandemic.
arXiv Detail & Related papers (2022-06-26T17:28:38Z) - Automatic Dialect Density Estimation for African American English [74.44807604000967]
We explore automatic prediction of dialect density of the African American English (AAE) dialect.
dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect.
We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database.
arXiv Detail & Related papers (2022-04-03T01:34:48Z) - Prosody Labelled Dataset for Hindi using Semi-Automated Approach [0.19733467999508417]
This study aims to develop a semi-automatically labelled prosody database for Hindi.
No single standard for prosody labelling exists in Hindi.
The accuracy of the trained models for pitch accent, intermediate phrase boundaries and accentual phrase boundaries is 73.40%, 93.20%, and 43% respectively.
arXiv Detail & Related papers (2021-12-11T13:11:36Z) - Prediction of Listener Perception of Argumentative Speech in a
Crowdsourced Data Using (Psycho-)Linguistic and Fluency Features [24.14001104126045]
We aim to predict TED talk-style affective ratings in a crowdsourced dataset of argumentative speech.
We present an effective approach to the classification task of predicting these categories through fine-tuning a model pre-trained on a large dataset of TED talks public speeches.
arXiv Detail & Related papers (2021-11-13T15:07:13Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Towards Modelling Coherence in Spoken Discourse [48.80477600384429]
Coherence in spoken discourse is dependent on the prosodic and acoustic patterns in speech.
We model coherence in spoken discourse with audio-based coherence models.
arXiv Detail & Related papers (2020-12-31T20:18:29Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z) - The Perceptimatic English Benchmark for Speech Perception Models [11.646802225841153]
The benchmark consists of ABX stimuli along with the responses of 91 American English-speaking listeners.
We show that DeepSpeech, a standard English speech recognizer, is more specialized on English phoneme discrimination than English listeners.
arXiv Detail & Related papers (2020-05-07T12:35:44Z) - Speaker Recognition in Bengali Language from Nonlinear Features [0.0]
The study of Bengali speech recognition and speaker identification is scarce in the literature.
In this work, we have extracted some acoustic features of speech using non linear multifractal analysis.
The Multifractal Detrended Fluctuation Analysis reveals essentially the complexity associated with the speech signals taken.
arXiv Detail & Related papers (2020-04-15T22:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.