Towards a Universal Method for Meaningful Signal Detection
- URL: http://arxiv.org/abs/2408.00016v3
- Date: Mon, 7 Oct 2024 07:54:37 GMT
- Title: Towards a Universal Method for Meaningful Signal Detection
- Authors: Louis Mahon,
- Abstract summary: It is known that human speech and certain animal vocalizations can convey meaningful content because we can decipher the content that a given utterance does convey.
This paper explores an alternative approach to determining whether a signal is meaningful, one that analyzes only the signal itself and is independent of what the conveyed meaning might be.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is known that human speech and certain animal vocalizations can convey meaningful content because we can decipher the content that a given utterance does convey. This paper explores an alternative approach to determining whether a signal is meaningful, one that analyzes only the signal itself and is independent of what the conveyed meaning might be. We devise a method that takes a waveform as input and outputs a score indicating its degree of `meaningfulness`. We cluster contiguous portions of the input to minimize the total description length, and then take the length of the code of the assigned cluster labels as meaningfulness score. We evaluate our method empirically, against several baselines, and show that it is the only one to give a high score to human speech in various languages and with various speakers, a moderate score to animal vocalizations from birds and orcas, and a low score to ambient noise from various sources.
Related papers
- Feature Representations for Automatic Meerkat Vocalization Classification [15.642602544201308]
This paper investigates feature representations for automatic meerkat vocalization analysis.
Call type classification studies conducted on two data sets reveal that feature extraction methods developed for human speech processing can be effectively employed for automatic meerkat call analysis.
arXiv Detail & Related papers (2024-08-27T10:51:51Z) - Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification [23.974783158267428]
We explore the use of self-supervised speech representation models pre-trained on human speech to address dog bark classification tasks.
We show that using speech embedding representations significantly improves over simpler classification baselines.
We also find that models pre-trained on large human speech acoustics can provide additional performance boosts on several tasks.
arXiv Detail & Related papers (2024-04-29T14:41:59Z) - Towards Lexical Analysis of Dog Vocalizations via Online Videos [19.422796780268605]
This study presents a data-driven investigation into the semantics of dog vocalizations via correlating different sound types with consistent semantics.
We first present a new dataset of Shiba Inu sounds, along with contextual information such as location and activity, collected from YouTube.
Based on the analysis of conditioned probability between dog vocalizations and corresponding location and activity, we discover supporting evidence for previous research on the semantic meaning of various dog sounds.
arXiv Detail & Related papers (2023-09-21T23:53:14Z) - Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE.
We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z) - EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
Resynthesis [49.04496602282718]
We introduce Expresso, a high-quality expressive speech dataset for textless speech synthesis.
This dataset includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles.
We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders.
arXiv Detail & Related papers (2023-08-10T17:41:19Z) - Unsupervised Multimodal Word Discovery based on Double Articulation
Analysis with Co-occurrence cues [7.332652485849632]
Human infants acquire their verbal lexicon with minimal prior knowledge of language.
This study proposes a novel fully unsupervised learning method for discovering speech units.
The proposed method can acquire words and phonemes from speech signals using unsupervised learning.
arXiv Detail & Related papers (2022-01-18T07:31:59Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Unsupervised Sound Localization via Iterative Contrastive Learning [106.56167882750792]
We propose an iterative contrastive learning framework that requires no data annotations.
We then use the pseudo-labels to learn the correlation between the visual and audio signals sampled from the same video.
Our iterative strategy gradually encourages the localization of the sounding objects and reduces the correlation between the non-sounding regions and the reference audio.
arXiv Detail & Related papers (2021-04-01T07:48:29Z) - Cross-modal variational inference for bijective signal-symbol
translation [11.444576186559486]
In this paper, we propose an approach for signal/symbol translation by turning this problem into a density estimation task.
We estimate this joint distribution with two different variational auto-encoders, one for each domain, whose inner representations are forced to match with an additive constraint.
In this article, we test our models on pitch, octave and dynamics symbols, which comprise a fundamental step towards music transcription and label-constrained audio generation.
arXiv Detail & Related papers (2020-02-10T15:25:48Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.