Aged to Perfection: Machine-Learning Maps of Age in Conversational English
- URL: http://arxiv.org/abs/2506.17708v1
- Date: Sat, 21 Jun 2025 13:08:57 GMT
- Title: Aged to Perfection: Machine-Learning Maps of Age in Conversational English
- Authors: MingZe Tang,
- Abstract summary: The study uses the British National Corpus 2014, a large sample of contemporary spoken British English, to investigate language patterns across different age groups.<n>Our research attempts to explore how language patterns vary between different age groups, exploring the connection between speaker demographics and linguistic factors such as utterance duration, lexical diversity, and word choice.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The study uses the British National Corpus 2014, a large sample of contemporary spoken British English, to investigate language patterns across different age groups. Our research attempts to explore how language patterns vary between different age groups, exploring the connection between speaker demographics and linguistic factors such as utterance duration, lexical diversity, and word choice. By merging computational language analysis and machine learning methodologies, we attempt to uncover distinctive linguistic markers characteristic of multiple generations and create prediction models that can consistently estimate the speaker's age group from various aspects. This work contributes to our knowledge of sociolinguistic diversity throughout the life of modern British speech.
Related papers
- Syntactic Evolution in Language Usage [0.0]
The research uses a data set of blogs from blogger.com from 2004 and focuses on English for syntactic analysis.<n>The findings of this research can have implications for linguistics, psychology and communication studies, shedding light on the intricate relationship between age and language.
arXiv Detail & Related papers (2025-01-04T22:27:24Z) - A Computational Model for the Assessment of Mutual Intelligibility Among
Closely Related Languages [1.5773159234875098]
Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it.
Mutual intelligibility varies in degree and is typically tested in psycholinguistic experiments.
We propose a computer-assisted method using the Linear Discriminative Learner to approximate the cognitive processes by which humans learn languages.
arXiv Detail & Related papers (2024-02-05T11:32:13Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech
Emotion Recognition [48.29355616574199]
We analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese.
This study concludes that different language and age groups require specific speech features, thus making cross-lingual inference an unsuitable method.
arXiv Detail & Related papers (2023-06-26T08:48:08Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in
Low-Resource English Varieties [3.3536302616846734]
We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits.
We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.
arXiv Detail & Related papers (2022-09-15T21:19:31Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Predicting gender and age categories in English conversations using
lexical, non-lexical, and turn-taking features [3.2766169283137385]
We examine behavioural differences between speakers labelled for gender and age categories in the SpokenBNC.
We find that female speakers tend to produce more and slightly longer turns, while turns by male speakers feature a higher type-token ratio.
Across age groups, we observe, for instance, that swear words and laughter characterize young speakers' talk, while old speakers tend to produce more truncated words.
arXiv Detail & Related papers (2021-02-26T08:23:08Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.