Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization
- URL: http://arxiv.org/abs/2509.04745v1
- Date: Fri, 05 Sep 2025 01:55:41 GMT
- Title: Phonological Representation Learning for Isolated Signs Improves Out-of-Vocabulary Generalization
- Authors: Lee Kezar, Zed Sehyr, Jesse Thomason,
- Abstract summary: Vector quantization is a promising approach for learning discrete, token-like representations.<n>It has not been evaluated whether the learned units capture spurious correlations that hinder out-of-vocabulary performance.<n>This work provides a quantitative analysis of how explicit, linguistically-motivated biases can improve the generalization of learned representations of sign language.
- Score: 9.324118291686906
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sign language datasets are often not representative in terms of vocabulary, underscoring the need for models that generalize to unseen signs. Vector quantization is a promising approach for learning discrete, token-like representations, but it has not been evaluated whether the learned units capture spurious correlations that hinder out-of-vocabulary performance. This work investigates two phonological inductive biases: Parameter Disentanglement, an architectural bias, and Phonological Semi-Supervision, a regularization technique, to improve isolated sign recognition of known signs and reconstruction quality of unseen signs with a vector-quantized autoencoder. The primary finding is that the learned representations from the proposed model are more effective for one-shot reconstruction of unseen signs and more discriminative for sign identification compared to a controlled baseline. This work provides a quantitative analysis of how explicit, linguistically-motivated biases can improve the generalization of learned representations of sign language.
Related papers
- Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs.<n>We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z) - Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Sign Languague Recognition without frame-sequencing constraints: A proof
of concept on the Argentinian Sign Language [42.27617228521691]
This paper presents a general probabilistic model for sign classification that combines sub-classifiers based on different types of features.
The proposed model achieved an accuracy rate of 97% on an Argentinian Sign Language dataset.
arXiv Detail & Related papers (2023-10-26T14:47:11Z) - Multi-Dialectal Representation Learning of Sinitic Phonology [0.0]
In Sinitic Historical Phonology, notable tasks that could benefit from machine learning include the comparison of dialects and reconstruction of proto-languages systems.
Motivated by this, this paper provides an approach for obtaining multi-dialectal representations of Sinitic syllables.
arXiv Detail & Related papers (2023-06-30T02:37:25Z) - Improving Sign Recognition with Phonology [8.27285154257448]
We use insights from research on American Sign Language phonology to train models for isolated sign language recognition.
We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics.
These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition accuracy on the WLASL benchmark.
arXiv Detail & Related papers (2023-02-11T18:51:23Z) - Classification of Phonological Parameters in Sign Languages [0.0]
Linguistic research often breaks down signs into constituent parts to study sign languages.
We show how a single model can be used to recognise the individual phonological parameters within sign languages.
arXiv Detail & Related papers (2022-05-24T13:40:45Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.<n>We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Learning De-identified Representations of Prosody from Raw Audio [7.025418443146435]
We propose a method for learning de-identified prosody representations from raw audio using a contrastive self-supervised signal.
We exploit the natural structure of prosody to minimize timbral information and decouple prosody from speaker representations.
arXiv Detail & Related papers (2021-07-17T14:37:25Z) - Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language.
We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.