Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
Translation
- URL: http://arxiv.org/abs/2204.09595v2
- Date: Thu, 21 Apr 2022 03:54:21 GMT
- Title: Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
Translation
- Authors: Chih-Chiang Chang, Hung-yi Lee
- Abstract summary: A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write.
This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF)
Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances.
- Score: 75.86581380817464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous speech translation (SimulST) is a challenging task aiming to
translate streaming speech before the complete input is observed. A SimulST
system generally includes two components: the pre-decision that aggregates the
speech information and the policy that decides to read or write. While recent
works had proposed various strategies to improve the pre-decision, they mainly
adopt the fixed wait-k policy, leaving the adaptive policies rarely explored.
This paper proposes to model the adaptive policy by adapting the Continuous
Integrate-and-Fire (CIF). Compared with monotonic multihead attention (MMA),
our method has the advantage of simpler computation, superior quality at low
latency, and better generalization to long utterances. We conduct experiments
on the MuST-C V2 dataset and show the effectiveness of our approach.
Related papers
- Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation [49.43094200366251]
We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition.
Our method, Policy Adaptation via Language Optimization (PALO), combines a handful of demonstrations of a task with proposed language decompositions.
We find that PALO is able of consistently complete long-horizon, multi-tier tasks in the real world, outperforming state of the art pre-trained generalist policies.
arXiv Detail & Related papers (2024-08-29T03:03:35Z) - Stochastic Dynamic Power Dispatch with High Generalization and Few-Shot
Adaption via Contextual Meta Graph Reinforcement Learning [7.251065697936476]
A novel contextual meta graph reinforcement learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch policy is proposed.
An upper meta-learner is proposed to encode context for different dispatch scenarios and learn how to achieve dispatch task identification while the lower policy learner learns context-specified dispatch policy.
After sufficient offline learning, this approach can rapidly adapt to unseen and undefined scenarios with only a few updations of the hypothesis judgments generated by the meta-learner.
arXiv Detail & Related papers (2024-01-19T13:58:46Z) - Using External Off-Policy Speech-To-Text Mappings in Contextual
End-To-End Automated Speech Recognition [19.489794740679024]
We investigate the potential of leveraging external knowledge, particularly through off-policy key-value stores generated with text-to-speech methods.
In our approach, audio embeddings captured from text-to-speech, along with semantic text embeddings, are used to bias ASR.
Experiments on LibiriSpeech and in-house voice assistant/search datasets show that the proposed approach can reduce domain adaptation time by up to 1K GPU-hours.
arXiv Detail & Related papers (2023-01-06T22:32:50Z) - Data-Driven Adaptive Simultaneous Machine Translation [51.01779863078624]
We propose a novel and efficient training scheme for adaptive SimulMT.
Our method outperforms all strong baselines in terms of translation quality and latency.
arXiv Detail & Related papers (2022-04-27T02:40:21Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Non-Stationary Off-Policy Optimization [50.41335279896062]
We study the novel problem of off-policy optimization in piecewise-stationary contextual bandits.
In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state.
In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance.
arXiv Detail & Related papers (2020-06-15T09:16:09Z) - Simultaneous Translation Policies: From Fixed to Adaptive [29.699912674525056]
We design an algorithm to achieve adaptive policies via a simple composition of a set of fixed policies.
Our algorithm can outperform fixed ones by up to 4 BLEU points for the same latency.
It even surpasses the BLEU score of full-sentence translation in the greedy mode.
arXiv Detail & Related papers (2020-04-27T20:56:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.