Speech Translation and the End-to-End Promise: Taking Stock of Where We
Are
- URL: http://arxiv.org/abs/2004.06358v1
- Date: Tue, 14 Apr 2020 08:43:51 GMT
- Title: Speech Translation and the End-to-End Promise: Taking Stock of Where We
Are
- Authors: Matthias Sperber, Matthias Paulik
- Abstract summary: Speech translation has experienced several shifts in its primary research themes.
Recent end-to-end modeling techniques promise a principled way of overcoming these issues.
Many end-to-end models fall short of solving these issues, due to compromises made to address data scarcity.
- Score: 16.45182811689674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over its three decade history, speech translation has experienced several
shifts in its primary research themes; moving from loosely coupled cascades of
speech recognition and machine translation, to exploring questions of tight
coupling, and finally to end-to-end models that have recently attracted much
attention. This paper provides a brief survey of these developments, along with
a discussion of the main challenges of traditional approaches which stem from
committing to intermediate representations from the speech recognizer, and from
training cascaded models separately towards different objectives.
Recent end-to-end modeling techniques promise a principled way of overcoming
these issues by allowing joint training of all model components and removing
the need for explicit intermediate representations. However, a closer look
reveals that many end-to-end models fall short of solving these issues, due to
compromises made to address data scarcity. This paper provides a unifying
categorization and nomenclature that covers both traditional and recent
approaches and that may help researchers by highlighting both trade-offs and
open research questions.
Related papers
- Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement [7.6115889231452964]
We introduce a novel approach termed "Topic Refinement"
This approach does not directly involve itself in the initial modeling of topics but focuses on improving topics after they have been mined.
By employing prompt engineering, we direct LLMs to eliminate off-topic words within a given topic, ensuring that only contextually relevant words are preserved or substituted with ones that fit better semantically.
arXiv Detail & Related papers (2024-03-26T13:50:34Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Recent Advances in Direct Speech-to-text Translation [58.692782919570845]
We categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and application issues.
For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling.
We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching.
arXiv Detail & Related papers (2023-06-20T16:14:27Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Collaborative Reasoning on Multi-Modal Semantic Graphs for
Video-Grounded Dialogue Generation [53.87485260058957]
We study video-grounded dialogue generation, where a response is generated based on the dialogue context and the associated video.
The primary challenges of this task lie in (1) the difficulty of integrating video data into pre-trained language models (PLMs)
We propose a multi-agent reinforcement learning method to collaboratively perform reasoning on different modalities.
arXiv Detail & Related papers (2022-10-22T14:45:29Z) - Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization [41.75442239197745]
This work proposes two topic-aware contrastive learning objectives, namely coherence detection and sub-summary generation objectives.
Experiments on benchmark datasets demonstrate that the proposed simple method significantly outperforms strong baselines.
arXiv Detail & Related papers (2021-09-10T17:03:25Z) - A Short Survey of Pre-trained Language Models for Conversational AI-A
NewAge in NLP [17.10418053437171]
Recently introduced pre-trained language models have the potential to address the issue of data scarcity.
These models have demonstrated to capture different facets of language such as hierarchical relations, long-term dependency, and sentiment.
This paper intends to establish whether these pre-trained models can overcome the challenges pertinent to dialogue systems.
arXiv Detail & Related papers (2021-04-22T01:00:56Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.