Related papers: Putting Natural in Natural Language Processing

Putting Natural in Natural Language Processing

URL: http://arxiv.org/abs/2305.04572v2
Date: Tue, 23 May 2023 14:15:00 GMT
Title: Putting Natural in Natural Language Processing
Authors: Grzegorz Chrupa{\l}a
Abstract summary: The field of NLP has overwhelmingly focused on processing written rather than spoken language. Recent advances in deep learning have led to a fortuitous convergence in methods between speech processing and mainstream NLP. Truly natural language processing could lead to better integration with the rest of language science.
Score: 11.746833714322156
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human language is firstly spoken and only secondarily written. Text, however, is a very convenient and efficient representation of language, and modern civilization has made it ubiquitous. Thus the field of NLP has overwhelmingly focused on processing written rather than spoken language. Work on spoken language, on the other hand, has been siloed off within the largely separate speech processing community which has been inordinately preoccupied with transcribing speech into text. Recent advances in deep learning have led to a fortuitous convergence in methods between speech processing and mainstream NLP. Arguably, the time is ripe for a unification of these two fields, and for starting to take spoken language seriously as the primary mode of human communication. Truly natural language processing could lead to better integration with the rest of language science and could lead to systems which are more data-efficient and more human-like, and which can communicate beyond the textual modality.

Related papers

Proceedings of the ISCA/ITG Workshop on Diversity in Large Speech and Language Models [11.46358189300007]
Modern techniques rely on large models for representing general knowledge of one or several languages. When humans interact with such technologies, the effectiveness of the interaction will be influenced by how far humans make use of the same type of language.
arXiv Detail & Related papers (2025-03-12T17:58:57Z)
Real-Time Multilingual Sign Language Processing [4.626189039960495]
Sign Language Processing (SLP) is an interdisciplinary field comprised of Natural Language Processing (NLP) and Computer Vision. Traditional approaches have often been constrained by the use of gloss-based systems that are both language-specific and inadequate for capturing the multidimensional nature of sign language. We propose the use of SignWiring, a universal sign language transcription notation system, to serve as an intermediary link between the visual-gestural modality of signed languages and text-based linguistic representations.
arXiv Detail & Related papers (2024-12-02T21:51:41Z)
Scaling Speech-Text Pre-training with Synthetic Interleaved Data [31.77653849518526]
Speech language models (SpeechLMs) accept speech input and produce speech output, allowing for more natural human-computer interaction. Traditional approaches for developing SpeechLMs are constrained by the limited availability of unsupervised speech data and parallel speech-text data. We propose a novel approach to scaling speech-text pre-training by leveraging large-scale synthetic interleaved data derived from text corpora.
arXiv Detail & Related papers (2024-11-26T17:19:09Z)
Evolution of Natural Language Processing Technology: Not Just Language Processing Towards General Purpose AI [0.0]
This report provides a technological explanation of how cutting-edge NLP has made it possible to realize the "practice makes perfect" principle. Achievements exceeding the initial predictions have been reported from the results of learning vast amounts of textual data using deep learning. It is an accurate example of the learner embodying the concept of "practice makes perfect" by using vast amounts of textual data.
arXiv Detail & Related papers (2023-10-10T00:41:38Z)
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation [65.13824257448564]
This paper proposes a textless training method for many-to-many multilingual speech-to-speech translation. By treating the speech units as pseudo-text, we can focus on the linguistic content of the speech. We demonstrate that the proposed UTUT model can be effectively utilized not only for Speech-to-Speech Translation (S2ST) but also for multilingual Text-to-Speech Synthesis (T2S) and Text-to-Speech Translation (T2ST)
arXiv Detail & Related papers (2023-08-03T15:47:04Z)
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models. Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z)
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition [75.12948999653338]
We propose a novel multi-task encoder-decoder pre-training framework (MMSpeech) for Mandarin automatic speech recognition (ASR) We employ a multi-task learning framework including five self-supervised and supervised tasks with speech and text data. Experiments on AISHELL-1 show that our proposed method achieves state-of-the-art performance, with a more than 40% relative improvement compared with other pre-training methods.
arXiv Detail & Related papers (2022-11-29T13:16:09Z)
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features [30.37026279162593]
In this work, we use embeddings derived from articulatory vectors rather than embeddings derived from phoneme identities to learn phoneme representations that hold across languages. This enables us to fine-tune a high-quality text-to-speech model on just 30 minutes of data in a previously unseen language spoken by a previously unseen speaker.
arXiv Detail & Related papers (2022-03-07T07:58:01Z)
Natural Language Generation Using Link Grammar for General Conversational Intelligence [0.0]
We propose a new technique to automatically generate grammatically valid sentences using the Link Grammar database. This natural language generation method far outperforms current state-of-the-art baselines and may serve as the final component in a proto-AGI question answering pipeline.
arXiv Detail & Related papers (2021-04-19T06:16:07Z)
Challenges Encountered in Turkish Natural Language Processing Studies [1.52292571922932]
Natural language processing is a branch of computer science that combines artificial intelligence with linguistics. In this study, the interesting features of Turkish in terms of natural language processing are mentioned.
arXiv Detail & Related papers (2021-01-21T08:30:33Z)
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams [58.617181880383605]
In this work, we propose a novel approach using phonetic posteriorgrams. Our method doesn't need hand-crafted features and is more robust to noise compared to recent approaches. Our model is the first to support multilingual/mixlingual speech as input with convincing results.
arXiv Detail & Related papers (2020-06-20T16:32:43Z)
That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z)
Experience Grounds Language [185.73483760454454]
Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates. Despite the incredible effectiveness of language processing models to tackle tasks after being trained on text alone, successful linguistic communication relies on a shared experience of the world.
arXiv Detail & Related papers (2020-04-21T16:56:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.