Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling
- URL: http://arxiv.org/abs/2506.11072v1
- Date: Mon, 02 Jun 2025 18:55:53 GMT
- Title: Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling
- Authors: Tahiya Chowdhury, Veronica Romero,
- Abstract summary: Machine learning-based behavioral models rely on features extracted from audio-visual recordings.<n>Machine learning tools often lack validation to ensure reliability in capturing behaviorally relevant information.<n>We evaluate speech features extracted from two widely used speech analysis tools, OpenSMILE and Praat, to assess their reliability when considering adolescents with autism.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning-based behavioral models rely on features extracted from audio-visual recordings. The recordings are processed using open-source tools to extract speech features for classification models. These tools often lack validation to ensure reliability in capturing behaviorally relevant information. This gap raises concerns about reproducibility and fairness across diverse populations and contexts. Speech processing tools, when used outside of their design context, can fail to capture behavioral variations equitably and can then contribute to bias. We evaluate speech features extracted from two widely used speech analysis tools, OpenSMILE and Praat, to assess their reliability when considering adolescents with autism. We observed considerable variation in features across tools, which influenced model performance across context and demographic groups. We encourage domain-relevant verification to enhance the reliability of machine learning models in clinical applications.
Related papers
- AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning [36.67330306977483]
Large Audio Language Models (LALMs) excel at perception but struggle with complex reasoning requiring precise acoustic measurements.<n>We propose AuTAgent, a reinforcement learning framework that learns when and which tools to invoke.
arXiv Detail & Related papers (2026-02-14T09:12:20Z) - AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning [66.24374176797075]
We introduce textbfAdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior.<n>AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that prioritizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage.
arXiv Detail & Related papers (2026-01-26T16:04:43Z) - Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z) - Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo captures high-level cognitive signals in the representation space, guiding when to invoke tools.<n>Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - Visual Exploration of Stopword Probabilities in Topic Models [1.9107347888374506]
Stopword removal is a critical stage in many Machine Learning methods.<n>Inappropriately chosen or hastily omitted stopwords not only lead to suboptimal performance but also significantly affect the quality of models.<n>This paper proposes a novel extraction method that provides a corpus-specific probabilistic estimation of stopword likelihood.
arXiv Detail & Related papers (2025-01-17T11:59:56Z) - Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets [0.5999777817331317]
This paper critically evaluates the prevalent assumption that machine learning models genuinely learn to identify paralinguistic traits.
By examining the lexical overlap in these datasets and testing the performance of machine learning models, we expose significant text-dependency in trait-labeling.
arXiv Detail & Related papers (2024-03-12T15:54:32Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - Democratize with Care: The need for fairness specific features in
user-interface based open source AutoML tools [0.0]
Automated Machine Learning (AutoML) streamlines the machine learning model development process.
This democratization allows more users (including non-experts) to access and utilize state-of-the-art machine-learning expertise.
However, AutoML tools may also propagate bias in the way these tools handle the data, model choices, and optimization approaches adopted.
arXiv Detail & Related papers (2023-12-16T19:54:00Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Analyzing Robustness of End-to-End Neural Models for Automatic Speech
Recognition [11.489161072526677]
We investigate robustness properties of pre-trained neural models for automatic speech recognition.
In this work, we perform a robustness analysis of the pre-trained neural models wav2vec2, HuBERT and DistilHuBERT on the LibriSpeech and TIMIT datasets.
arXiv Detail & Related papers (2022-08-17T20:00:54Z) - Plex: Towards Reliability using Pretrained Large Model Extensions [69.13326436826227]
We develop ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively.
Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol.
We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples.
arXiv Detail & Related papers (2022-07-15T11:39:37Z) - An Interactive Visualization Tool for Understanding Active Learning [12.345164513513671]
We present an interactive visualization tool to elucidate the training process of active learning.
The tool enables one to select a sample of interesting data points, view how their prediction values change at different querying stages, and thus better understand when and how active learning works.
arXiv Detail & Related papers (2021-11-09T03:33:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.