Putting Natural in Natural Language Processing
- URL: http://arxiv.org/abs/2305.04572v2
- Date: Tue, 23 May 2023 14:15:00 GMT
- Title: Putting Natural in Natural Language Processing
- Authors: Grzegorz Chrupa{\l}a
- Abstract summary: The field of NLP has overwhelmingly focused on processing written rather than spoken language.
Recent advances in deep learning have led to a fortuitous convergence in methods between speech processing and mainstream NLP.
Truly natural language processing could lead to better integration with the rest of language science.
- Score: 11.746833714322156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human language is firstly spoken and only secondarily written. Text, however,
is a very convenient and efficient representation of language, and modern
civilization has made it ubiquitous. Thus the field of NLP has overwhelmingly
focused on processing written rather than spoken language. Work on spoken
language, on the other hand, has been siloed off within the largely separate
speech processing community which has been inordinately preoccupied with
transcribing speech into text. Recent advances in deep learning have led to a
fortuitous convergence in methods between speech processing and mainstream NLP.
Arguably, the time is ripe for a unification of these two fields, and for
starting to take spoken language seriously as the primary mode of human
communication. Truly natural language processing could lead to better
integration with the rest of language science and could lead to systems which
are more data-efficient and more human-like, and which can communicate beyond
the textual modality.
Related papers
- Scaling Speech-Text Pre-training with Synthetic Interleaved Data [31.77653849518526]
Speech language models (SpeechLMs) accept speech input and produce speech output, allowing for more natural human-computer interaction.
Traditional approaches for developing SpeechLMs are constrained by the limited availability of unsupervised speech data and parallel speech-text data.
We propose a novel approach to scaling speech-text pre-training by leveraging large-scale synthetic interleaved data derived from text corpora.
arXiv Detail & Related papers (2024-11-26T17:19:09Z) - Evolution of Natural Language Processing Technology: Not Just Language
Processing Towards General Purpose AI [0.0]
This report provides a technological explanation of how cutting-edge NLP has made it possible to realize the "practice makes perfect" principle.
Achievements exceeding the initial predictions have been reported from the results of learning vast amounts of textual data using deep learning.
It is an accurate example of the learner embodying the concept of "practice makes perfect" by using vast amounts of textual data.
arXiv Detail & Related papers (2023-10-10T00:41:38Z) - Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation [65.13824257448564]
This paper proposes a textless training method for many-to-many multilingual speech-to-speech translation.
By treating the speech units as pseudo-text, we can focus on the linguistic content of the speech.
We demonstrate that the proposed UTUT model can be effectively utilized not only for Speech-to-Speech Translation (S2ST) but also for multilingual Text-to-Speech Synthesis (T2S) and Text-to-Speech Translation (T2ST)
arXiv Detail & Related papers (2023-08-03T15:47:04Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech
Recognition [75.12948999653338]
We propose a novel multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR)
We employ a multi-task learning framework including five self-supervised and supervised tasks with speech and text data.
Experiments on AISHELL-1 show that our proposed method achieves state-of-the-art performance, with a more than 40% relative improvement compared with other pre-training methods.
arXiv Detail & Related papers (2022-11-29T13:16:09Z) - Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with
Articulatory Features [30.37026279162593]
In this work, we use embeddings derived from articulatory vectors rather than embeddings derived from phoneme identities to learn phoneme representations that hold across languages.
This enables us to fine-tune a high-quality text-to-speech model on just 30 minutes of data in a previously unseen language spoken by a previously unseen speaker.
arXiv Detail & Related papers (2022-03-07T07:58:01Z) - Natural Language Generation Using Link Grammar for General
Conversational Intelligence [0.0]
We propose a new technique to automatically generate grammatically valid sentences using the Link Grammar database.
This natural language generation method far outperforms current state-of-the-art baselines and may serve as the final component in a proto-AGI question answering pipeline.
arXiv Detail & Related papers (2021-04-19T06:16:07Z) - Challenges Encountered in Turkish Natural Language Processing Studies [1.52292571922932]
Natural language processing is a branch of computer science that combines artificial intelligence with linguistics.
In this study, the interesting features of Turkish in terms of natural language processing are mentioned.
arXiv Detail & Related papers (2021-01-21T08:30:33Z) - Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking
Head Generation Using Phonetic Posteriorgrams [58.617181880383605]
In this work, we propose a novel approach using phonetic posteriorgrams.
Our method doesn't need hand-crafted features and is more robust to noise compared to recent approaches.
Our model is the first to support multilingual/mixlingual speech as input with convincing results.
arXiv Detail & Related papers (2020-06-20T16:32:43Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Experience Grounds Language [185.73483760454454]
Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates.
Despite the incredible effectiveness of language processing models to tackle tasks after being trained on text alone, successful linguistic communication relies on a shared experience of the world.
arXiv Detail & Related papers (2020-04-21T16:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.