Semantic Complexity in End-to-End Spoken Language Understanding
- URL: http://arxiv.org/abs/2008.02858v1
- Date: Thu, 6 Aug 2020 20:18:53 GMT
- Title: Semantic Complexity in End-to-End Spoken Language Understanding
- Authors: Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P.
Strimel, Athanasios Mouchtaris
- Abstract summary: We analyze the relationship between the performance of STI models and the difficulty of the use case to which they are applied.
We show that near-perfect performance metrics for STI models reported in the literature were obtained with datasets with low semantic complexity values.
- Score: 20.184305170102082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end spoken language understanding (SLU) models are a class of model
architectures that predict semantics directly from speech. Because of their
input and output types, we refer to them as speech-to-interpretation (STI)
models. Previous works have successfully applied STI models to targeted use
cases, such as recognizing home automation commands, however no study has yet
addressed how these models generalize to broader use cases. In this work, we
analyze the relationship between the performance of STI models and the
difficulty of the use case to which they are applied. We introduce empirical
measures of dataset semantic complexity to quantify the difficulty of the SLU
tasks. We show that near-perfect performance metrics for STI models reported in
the literature were obtained with datasets that have low semantic complexity
values. We perform experiments where we vary the semantic complexity of a
large, proprietary dataset and show that STI model performance correlates with
our semantic complexity measures, such that performance increases as complexity
values decrease. Our results show that it is important to contextualize an STI
model's performance with the complexity values of its training dataset to
reveal the scope of its applicability.
Related papers
- Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples.
We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics.
When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations [13.608653575298183]
We introduce the SUGARCREPE++ dataset to analyze the sensitivity of vision-and-language models to semantic alterations.
We show that all the models which achieve better performance on compositionality datasets need not perform equally well on SUGARCREPE++.
arXiv Detail & Related papers (2024-06-17T03:22:20Z) - Learning to Reduce: Optimal Representations of Structured Data in
Prompting Large Language Models [42.16047343029512]
Large Language Models (LLMs) have been widely used as general-purpose AI agents.
We propose a framework, Learning to Reduce, that fine-tunes a language model to generate a reduced version of an input context.
We show that our model achieves comparable accuracies in selecting the relevant evidence from an input context.
arXiv Detail & Related papers (2024-02-22T00:41:23Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - EvEntS ReaLM: Event Reasoning of Entity States via Language Models [24.077262847151232]
Nominally, Large Language models (LLM) have been exposed to procedural knowledge about how objects interact, yet our benchmarking shows they fail to reason about the world.
In particular, our results indicate that our prompting technique is especially useful for unseen attributes (out-of-domain) or when only limited data is available.
arXiv Detail & Related papers (2022-11-10T07:48:01Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.