Speech segmentation using multilevel hybrid filters
- URL: http://arxiv.org/abs/2203.01819v1
- Date: Thu, 24 Feb 2022 00:03:02 GMT
- Title: Speech segmentation using multilevel hybrid filters
- Authors: Marcos Faundez-Zanuy, Francesc Vallverdu-Bayes
- Abstract summary: A novel approach for speech segmentation is proposed, based on Multilevel Hybrid (mean/min) Filters (MHF)
The proposed method is based on spectral changes, with the goal of segmenting the voice into homogeneous acoustic segments.
This algorithm is being used for phoneticallysegmented speech coder, with successful results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A novel approach for speech segmentation is proposed, based on Multilevel
Hybrid (mean/min) Filters (MHF) with the following features: An accurate
transition location. Good performance in noisy environments (gaussian and
impulsive noise). The proposed method is based on spectral changes, with the
goal of segmenting the voice into homogeneous acoustic segments. This algorithm
is being used for phoneticallysegmented speech coder, with successful results.
Related papers
- MaskCycleGAN-based Whisper to Normal Speech Conversion [0.0]
We present a MaskCycleGAN approach for the conversion of whispered speech to normal speech.
We find that tuning the mask parameters, and pre-processing the signal with a voice activity detector provides superior performance.
arXiv Detail & Related papers (2024-08-27T06:07:18Z) - Multi-Dimensional and Multi-Scale Modeling for Speech Separation
Optimized by Discriminative Learning [9.84949849886926]
Intra-SE-Conformer and Inter-Transformer (ISCIT) for speech separation.
New network SE-Conformer can model audio sequences in multiple dimensions and scales.
arXiv Detail & Related papers (2023-03-07T08:53:20Z) - Speech Segmentation Optimization using Segmented Bilingual Speech Corpus
for End-to-end Speech Translation [16.630616128169372]
We propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus.
Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods.
arXiv Detail & Related papers (2022-03-29T12:26:56Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Single-channel speech separation using Soft-minimum Permutation
Invariant Training [60.99112031408449]
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal.
Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem.
In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment.
arXiv Detail & Related papers (2021-11-16T17:25:05Z) - Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
Modeling [61.351967629600594]
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach.
In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module.
Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity.
arXiv Detail & Related papers (2020-09-06T13:01:06Z) - Delving Deeper into Anti-aliasing in ConvNets [42.82751522973616]
Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling.
We propose an adaptive content-aware low-pass filtering layer, which predicts separate filter weights for each spatial location and channel group.
arXiv Detail & Related papers (2020-08-21T17:56:04Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.