Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model
Improves End-to-End ASR
- URL: http://arxiv.org/abs/2402.15594v1
- Date: Fri, 23 Feb 2024 20:26:54 GMT
- Title: Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model
Improves End-to-End ASR
- Authors: Jintao Jiang, Yingbo Gao, Mohammad Zeineldeen, Zoltan Tuske
- Abstract summary: alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training.
We show that either triphone or BPE alignment based weak supervision improves ASR performance over standard CTC auxiliary loss.
- Score: 9.24160000451216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, alternating weak triphone/BPE alignment supervision is
proposed to improve end-to-end model training. Towards this end, triphone and
BPE alignments are extracted using a pre-existing hybrid ASR system. Then,
regularization effect is obtained by cross-entropy based intermediate auxiliary
losses computed on such alignments at a mid-layer representation of the encoder
for triphone alignments and at the encoder for BPE alignments. Weak supervision
is achieved through strong label smoothing with parameter of 0.5. Experimental
results on TED-LIUM 2 indicate that either triphone or BPE alignment based weak
supervision improves ASR performance over standard CTC auxiliary loss.
Moreover, their combination lowers the word error rate further. We also
investigate the alternation of the two auxiliary tasks during model training,
and additional performance gain is observed. Overall, the proposed techniques
result in over 10% relative error rate reduction over a CTC-regularized
baseline system.
Related papers
- Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment [27.352639822596146]
Cross-worker divergence in losses and gradients can remain invisible under conventional monitoring signals.<n>We propose a model-agnostic diagnostic framework that quantifies worker-level consistency using training signals readily available in standard pipelines.
arXiv Detail & Related papers (2026-02-16T04:42:30Z) - Joint Orientation and Weight Optimization for Robust Watertight Surface Reconstruction via Dirichlet-Regularized Winding Fields [77.36628820738271]
Dirichlet Winding Reconstruction (DiWR) is a robust method for reconstructing watertight surfaces from unoriented point clouds.<n>Our method uses the generalized winding number (GWN) field as the target implicit representation.
arXiv Detail & Related papers (2026-02-14T14:27:07Z) - ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting [63.138778159026934]
We propose an adaptive optimization framework guided by excess risk decomposition, termed ERGO.<n> ERGO dynamically estimates the view-specific excess risk and adaptively adjust loss weights during optimization.<n>Experiments on the Google Scanned Objects dataset and the OmniObject3D dataset demonstrate the superiority of ERGO over existing state-of-the-art methods.
arXiv Detail & Related papers (2026-02-10T20:44:43Z) - A Constrained Optimization Perspective of Unrolled Transformers [77.12297732942095]
We introduce a constrained optimization framework for training transformers that behave like optimization descent algorithms.<n>We observe constrained transformers achieve stronger to perturbations robustness and maintain higher out-of-distribution generalization.
arXiv Detail & Related papers (2026-01-24T02:12:39Z) - RatioWaveNet: A Learnable RDWT Front-End for Robust and Interpretable EEG Motor-Imagery Classification [1.4069478981641936]
We present RatioWaveNet, which augments a strong temporal CNN-Transformer backbone with a trainable, Rationally-Dilated Wavelet Transform front end.<n>Our goal is to test whether this principled wavelet front end improves robustness precisely where BCIs typically fail.
arXiv Detail & Related papers (2025-10-22T14:04:03Z) - DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks [47.58150560549918]
Weight-Decomposed Low-Rank Adaptation (DoRA) has been shown to improve both the learning capacity and training stability of the vanilla Low-Rank Adaptation (LoRA) method.<n>We propose DoRAN, a new variant of DoRA designed to further stabilize training and boost the sample efficiency of DoRA.
arXiv Detail & Related papers (2025-10-05T19:27:48Z) - Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios [4.735413508037063]
This paper proposes a momentum-constrained hybrid trajectory optimization framework (MHHTOF) tailored for assistive navigation in visually impaired scenarios.<n>It integrates trajectory sampling generation, optimization and evaluation with residual deep reinforcement learning (DRL)<n> Experimental results demonstrate that the proposed LSTM-BResPPO achieves significantly faster convergence, attaining stable policy performance in approximately half the training required by the PPO.
arXiv Detail & Related papers (2025-09-19T04:33:39Z) - Backscatter Device-aided Integrated Sensing and Communication: A Pareto Optimization Framework [59.30060797118097]
Integrated sensing and communication (ISAC) systems potentially encounter significant performance degradation in densely obstructed urban non-line-of-sight scenarios.<n>This paper proposes a backscatter approximation (BD)-assisted ISAC system, which leverages passive BDs naturally distributed in environments of enhancement.
arXiv Detail & Related papers (2025-07-12T17:11:06Z) - A TRPCA-Inspired Deep Unfolding Network for Hyperspectral Image Denoising via Thresholded t-SVD and Top-K Sparse Transformer [20.17660504535571]
We propose a novel deep unfolding network (DU-TRPCA) that enforces stage-wise alternation between two tightly integrated modules: low-rank and sparse.<n>Experiments on synthetic and real-world HSIs demonstrate that DU-TRPCA surpasses state-of-the-art methods under severe mixed noise.
arXiv Detail & Related papers (2025-06-03T02:01:39Z) - Joint Unsupervised and Supervised Training for Automatic Speech
Recognition via Bilevel Optimization [73.98386682604122]
We present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term bi-level joint unsupervised and supervised training (BL-JUST).
BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.
arXiv Detail & Related papers (2024-01-13T05:01:47Z) - Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR [5.2823268671093775]
We create weak alignment supervision from an existing hybrid system to aid the end-to-end modeling of automatic speech recognition.
We then create a cross-entropy loss at a certain layer of the encoder using the derived alignments.
In contrast to the general one-hot cross-entropy losses, here we use a cross-entropy loss with a label smoothing parameter to regularize the supervision.
arXiv Detail & Related papers (2023-11-24T20:14:28Z) - Learning Repeatable Speech Embeddings Using An Intra-class Correlation
Regularizer [16.716653844774374]
We evaluate the repeatability of embeddings using the intra-class correlation coefficient (ICC)
We propose a novel regularizer, the ICC regularizer, as a complementary component for contrastive losses to guide deep neural networks to produce embeddings with higher repeatability.
We implement the ICC regularizer and apply it to three speech tasks: speaker verification, voice style conversion, and a clinical application for detecting dysphonic voice.
arXiv Detail & Related papers (2023-10-25T23:21:46Z) - Deep Autoencoder-based Z-Interference Channels with Perfect and
Imperfect CSI [14.04355073946466]
A deep autoencoder (DAE)-based structure for endto-end communication over the two-user Z-interference channel (ZIC) with finite-alphabet inputs is designed in this paper.
The proposed structure jointly optimize the two encoder/decoder pairs and generates interference-aware constellations that dynamically adapt their shape based on interference intensity to minimize the bit error rate (BER)
An in-phase/quadrature-phase (I/Q) power allocation layer is introduced in the DAE to guarantee an average power constraint and enable the architecture to generate constellations with nonuniform shapes.
arXiv Detail & Related papers (2023-10-23T15:23:42Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Mitigating the Alignment Tax of RLHF [76.4300447532456]
aligning LLMs under Reinforcement Learning with Human Feedback can lead to forgetting pretrained abilities, also known as the alignment tax.
We propose model averaging to maximize alignment performance while incurring minimal alignment tax.
We validate HMA's performance across a range of RLHF algorithms over OpenLLaMA-3B and further extend our findings to Mistral-7B.
arXiv Detail & Related papers (2023-09-12T14:16:54Z) - Parameter-Efficient Learning for Text-to-Speech Accent Adaptation [58.356667204518985]
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS)
A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters.
Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning.
arXiv Detail & Related papers (2023-05-18T22:02:59Z) - Stochastic Primal-Dual Three Operator Splitting Algorithm with Extension to Equivariant Regularization-by-Denoising [12.187438033643797]
We propose a primal-dual three-operator splitting algorithm (TOS-SPDHG) for solving a class of convex three-composite optimization problems.
We provide theoretical convergence analysis showing ergodic $O (1/K)$ convergence rate, and demonstrate the effectiveness of our approach in imaging inverse problems.
We also propose TOS-SPDHG-RED and TOS-SPDHG-eRED which utilize the regularization-by-denoising framework to leverage pretrained deep denoising networks as priors.
arXiv Detail & Related papers (2022-08-02T17:58:52Z) - ADC-Net: An Open-Source Deep Learning Network for Automated Dispersion
Compensation in Optical Coherence Tomography [0.0]
This study is to develop a deep learning network for automated dispersion compensation (ADC-Net) in optical coherence tomography ( OCT)
The ADC-Net is based on a redesigned UNet architecture which employs an encoder-decoder pipeline.
Two numeric parameters, i.e., peak signal to noise ratio (PSNR) and structural similarity index metric computed at multiple scales (MS-SSIM) were used for objective assessment of the ADC-Net performance.
arXiv Detail & Related papers (2022-01-29T17:23:46Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - Reconcile Prediction Consistency for Balanced Object Detection [10.61438063305309]
We propose a Harmonic loss to harmonize the optimization of classification branch and localization branch.
The Harmonic loss enables these two branches to supervise and promote each other during training.
In order to prevent the localization loss from being dominated by outliers during training phase, a Harmonic IoU loss is proposed to harmonize the weight of the localization loss of different IoU-level samples.
arXiv Detail & Related papers (2021-08-24T15:52:11Z) - Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.