Experiences with the Introduction of AI-based Tools for Moderation
Automation of Voice-based Participatory Media Forums
- URL: http://arxiv.org/abs/2108.04208v1
- Date: Mon, 9 Aug 2021 17:50:33 GMT
- Title: Experiences with the Introduction of AI-based Tools for Moderation
Automation of Voice-based Participatory Media Forums
- Authors: Aman Khullar, Paramita Panjal, Rachit Pandey, Abhishek Burnwal,
Prashit Raj, Ankit Akash Jha, Priyadarshi Hitesh, R Jayanth Reddy, Himanshu,
Aaditeshwar Seth
- Abstract summary: We introduce AI tools to filter out blank or noisy audios, use speech recognition to transcribe the voice messages in text, and use natural language processing techniques to extract metadata from the audio transcripts.
We present our findings in terms of the time and cost-savings made through the introduction of these tools, and describe the feedback of the moderators towards the acceptability of AI-based automation in their workflow.
Our work forms a case-study in the use of AI for automation of several routine tasks, and can be especially relevant for other researchers and practitioners involved with the use of voice-based technologies in developing regions of the
- Score: 0.5243067689245634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Voice-based discussion forums where users can record audio messages which are
then published for other users to listen and comment, are often moderated to
ensure that the published audios are of good quality, relevant, and adhere to
editorial guidelines of the forum. There is room for the introduction of
AI-based tools in the moderation process, such as to identify and filter out
blank or noisy audios, use speech recognition to transcribe the voice messages
in text, and use natural language processing techniques to extract relevant
metadata from the audio transcripts. We design such tools and deploy them
within a social enterprise working in India that runs several voice-based
discussion forums. We present our findings in terms of the time and
cost-savings made through the introduction of these tools, and describe the
feedback of the moderators towards the acceptability of AI-based automation in
their workflow. Our work forms a case-study in the use of AI for automation of
several routine tasks, and can be especially relevant for other researchers and
practitioners involved with the use of voice-based technologies in developing
regions of the world.
Related papers
- Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge.
This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition.
We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z) - Automatic Speech Recognition for Hindi [0.6292138336765964]
The research involved developing a web application and designing a web interface for speech recognition.
The web application manages large volumes of audio files and their transcriptions, facilitating human correction of ASR transcripts.
The web interface for speech recognition records 16 kHz mono audio from any device running the web app, performs voice activity detection (VAD), and sends the audio to the recognition engine.
arXiv Detail & Related papers (2024-06-26T07:39:20Z) - App for Resume-Based Job Matching with Speech Interviews and Grammar
Analysis: A Review [0.11249583407496219]
We explore the feasibility of an end-to-end system providing speech and text based natural language processing for job interview preparation.
We also explore existing recommender-based systems and note their limitations.
arXiv Detail & Related papers (2023-11-20T18:03:08Z) - Learning Speech Representation From Contrastive Token-Acoustic
Pretraining [57.08426714676043]
We propose "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space.
The proposed CTAP model is trained on 210k speech and phoneme pairs, achieving minimally-supervised TTS, VC, and ASR.
arXiv Detail & Related papers (2023-09-01T12:35:43Z) - Speech Aware Dialog System Technology Challenge (DSTC11) [12.841429336655736]
Most research on task oriented dialog modeling is based on written text input.
We created three spoken versions of the popular written-domain MultiWoz task -- (a) TTS-Verbatim: written user inputs were converted into speech waveforms using a TTS system, (b) Human-Verbatim: humans spoke the user inputs verbatim, and (c) Human-paraphrased: humans paraphrased the user inputs.
arXiv Detail & Related papers (2022-12-16T20:30:33Z) - SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
Based Speech-Text Pre-training [106.34112664893622]
We propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder.
Our proposed SpeechUT is fine-tuned and evaluated on automatic speech recognition (ASR) and speech translation (ST) tasks.
arXiv Detail & Related papers (2022-10-07T17:57:45Z) - Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration [62.75234183218897]
We propose a one-stage context-aware framework to generate natural and coherent target speech without any training data of the speaker.
We generate the mel-spectrogram of the edited speech with a transformer-based decoder.
It outperforms a recent zero-shot TTS engine by a large margin.
arXiv Detail & Related papers (2021-09-12T04:17:53Z) - SpeechBrain: A General-Purpose Speech Toolkit [73.0404642815335]
SpeechBrain is an open-source and all-in-one speech toolkit.
It is designed to facilitate the research and development of neural speech processing technologies.
It achieves competitive or state-of-the-art performance in a wide range of speech benchmarks.
arXiv Detail & Related papers (2021-06-08T18:22:56Z) - Using Voice and Biofeedback to Predict User Engagement during
Requirements Interviews [11.277063517143565]
We propose to utilize biometric data, in terms of physiological and voice features, to complement interviews with information about user engagement.
We evaluate our approach by interviewing users while gathering their physiological data using an Empatica E4 wristband.
Our results show that we can predict users' engagement by training supervised machine learning algorithms on biometric data.
arXiv Detail & Related papers (2021-04-06T10:34:36Z) - Voice Privacy with Smart Digital Assistants in Educational Settings [1.8369974607582578]
We design and evaluate a practical and efficient framework for voice privacy at the source.
The approach combines speaker identification (SID) and speech conversion methods to randomly disguise the identity of users right on the device that records the speech.
We evaluate the ASR performance of the conversion in terms of word error rate and show the promise of this framework in preserving the content of the input speech.
arXiv Detail & Related papers (2021-03-24T19:58:45Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.