Recent Advances in Direct Speech-to-text Translation
- URL: http://arxiv.org/abs/2306.11646v1
- Date: Tue, 20 Jun 2023 16:14:27 GMT
- Title: Recent Advances in Direct Speech-to-text Translation
- Authors: Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang,
Tong Xiao, Jingbo Zhu
- Abstract summary: We categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and application issues.
For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling.
We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching.
- Score: 58.692782919570845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, speech-to-text translation has attracted more and more attention
and many studies have emerged rapidly. In this paper, we present a
comprehensive survey on direct speech translation aiming to summarize the
current state-of-the-art techniques. First, we categorize the existing research
work into three directions based on the main challenges -- modeling burden,
data scarcity, and application issues. To tackle the problem of modeling
burden, two main structures have been proposed, encoder-decoder framework
(Transformer and the variants) and multitask frameworks. For the challenge of
data scarcity, recent work resorts to many sophisticated techniques, such as
data augmentation, pre-training, knowledge distillation, and multilingual
modeling. We analyze and summarize the application issues, which include
real-time, segmentation, named entity, gender bias, and code-switching.
Finally, we discuss some promising directions for future work.
Related papers
- From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models [17.04716417556556]
This review visits foundational concepts such as the distributional hypothesis and contextual similarity.
We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT.
The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models.
Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications.
arXiv Detail & Related papers (2024-11-06T15:40:02Z) - A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges [35.873666277696096]
Multi-modal machine translation has attracted significant interest in both academia and industry.
It takes both textual and visual modalities as inputs, leveraging visual context to tackle the ambiguities in source texts.
arXiv Detail & Related papers (2024-05-21T10:34:47Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Multilingual Multimodality: A Taxonomical Survey of Datasets,
Techniques, Challenges and Opportunities [10.721189858694396]
We study the unification of multilingual and multimodal (MultiX) streams.
We review the languages studied, gold or silver data with parallel annotations, and understand how these modalities and languages interact in modeling.
We present an account of the modeling approaches along with their strengths and weaknesses to better understand what scenarios they can be used reliably.
arXiv Detail & Related papers (2022-10-30T21:46:01Z) - Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange.
This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z) - X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents [12.493662336994106]
We present an abstractive cross-lingual summarization dataset for four different languages in the scholarly domain.
We train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese.
arXiv Detail & Related papers (2022-05-30T12:31:28Z) - Positioning yourself in the maze of Neural Text Generation: A
Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc.
We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - Speech Translation and the End-to-End Promise: Taking Stock of Where We
Are [16.45182811689674]
Speech translation has experienced several shifts in its primary research themes.
Recent end-to-end modeling techniques promise a principled way of overcoming these issues.
Many end-to-end models fall short of solving these issues, due to compromises made to address data scarcity.
arXiv Detail & Related papers (2020-04-14T08:43:51Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.