Self-Supervised Speech Representations are More Phonetic than Semantic
- URL: http://arxiv.org/abs/2406.08619v1
- Date: Wed, 12 Jun 2024 20:04:44 GMT
- Title: Self-Supervised Speech Representations are More Phonetic than Semantic
- Authors: Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe,
- Abstract summary: Self-supervised speech models (S3Ms) have become an effective backbone for speech applications.
We seek a more fine-grained analysis of the word-level linguistic properties encoded in S3Ms.
Our study reveals that S3M representations consistently and significantly exhibit more phonetic than semantic similarity.
- Score: 52.02626675137819
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised speech models (S3Ms) have become an effective backbone for speech applications. Various analyses suggest that S3Ms encode linguistic properties. In this work, we seek a more fine-grained analysis of the word-level linguistic properties encoded in S3Ms. Specifically, we curate a novel dataset of near homophone (phonetically similar) and synonym (semantically similar) word pairs and measure the similarities between S3M word representation pairs. Our study reveals that S3M representations consistently and significantly exhibit more phonetic than semantic similarity. Further, we question whether widely used intent classification datasets such as Fluent Speech Commands and Snips Smartlights are adequate for measuring semantic abilities. Our simple baseline, using only the word identity, surpasses S3M-based models. This corroborates our findings and suggests that high scores on these datasets do not necessarily guarantee the presence of semantic content.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics [50.982315553104975]
We investigate the bottom-up evolution of lexical semantics for a popular large language model, namely Llama2.
Our experiments show that the representations in lower layers encode lexical semantics, while the higher layers, with weaker semantic induction, are responsible for prediction.
This is in contrast to models with discriminative objectives, such as mask language modeling, where the higher layers obtain better lexical semantics.
arXiv Detail & Related papers (2024-03-03T13:14:47Z) - Leveraging multilingual transfer for unsupervised semantic acoustic word
embeddings [23.822788597966646]
Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content.
In this paper we explore semantic AWE modelling.
We show -- for the first time -- that AWEs can be used for downstream semantic query-by-example search.
arXiv Detail & Related papers (2023-07-05T07:46:54Z) - What Do Self-Supervised Speech Models Know About Words? [23.163029143563893]
Self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks.
Recent work has begun analyzing how S3Ms encode certain properties, such as phonetic and speaker information.
We use lightweight analysis methods to study segment-level linguistic properties encoded in S3Ms.
arXiv Detail & Related papers (2023-06-30T22:36:41Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Disentangling Learnable and Memorizable Data via Contrastive Learning
for Semantic Communications [81.10703519117465]
A novel machine reasoning framework is proposed to disentangle source data so as to make it semantic-ready.
In particular, a novel contrastive learning framework is proposed, whereby instance and cluster discrimination are performed on the data.
Deep semantic clusters of highest confidence are considered learnable, semantic-rich data.
Our simulation results showcase the superiority of our contrastive learning approach in terms of semantic impact and minimalism.
arXiv Detail & Related papers (2022-12-18T12:00:12Z) - Audio-to-Intent Using Acoustic-Textual Subword Representations from
End-to-End ASR [8.832255053182283]
We present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens.
We show that our approach is highly accurate with correctly mitigating 93.3% of unintended user audio from invoking the smart assistant at 99% true positive rate.
arXiv Detail & Related papers (2022-10-21T17:45:00Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.