Deep Diacritization: Efficient Hierarchical Recurrence for Improved
Arabic Diacritization
- URL: http://arxiv.org/abs/2011.00538v1
- Date: Sun, 1 Nov 2020 15:33:43 GMT
- Title: Deep Diacritization: Efficient Hierarchical Recurrence for Improved
Arabic Diacritization
- Authors: Badr AlKhamissi, Muhammad N. ElNokrashy and Mohamed Gabr
- Abstract summary: We propose a novel architecture for labelling character sequences that achieves state-of-the-art results on the Tashkeela Arabic diacritization benchmark.
The core is a two-level recurrence hierarchy that operates on the word and character levels separately.
A cross-level attention module further connects the two, and opens the door for network interpretability.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel architecture for labelling character sequences that
achieves state-of-the-art results on the Tashkeela Arabic diacritization
benchmark. The core is a two-level recurrence hierarchy that operates on the
word and character levels separately---enabling faster training and inference
than comparable traditional models. A cross-level attention module further
connects the two, and opens the door for network interpretability. The task
module is a softmax classifier that enumerates valid combinations of
diacritics. This architecture can be extended with a recurrent decoder that
optionally accepts priors from partially diacritized text, which improves
results. We employ extra tricks such as sentence dropout and majority voting to
further boost the final result. Our best model achieves a WER of 5.34%,
outperforming the previous state-of-the-art with a 30.56% relative error
reduction.
Related papers
- An Ordinal Regression Framework for a Deep Learning Based Severity
Assessment for Chest Radiographs [50.285682227571996]
We propose a framework that divides the ordinal regression problem into three parts: a model, a target function, and a classification function.
We show that the choice of encoding has a strong impact on performance and that the best encoding depends on the chosen weighting of Cohen's kappa.
arXiv Detail & Related papers (2024-02-08T14:00:45Z) - Improving Grammar-based Sequence-to-Sequence Modeling with Decomposition
and Constraints [30.219318352970948]
We study two low-rank variants of Neural QCFG for faster inference.
We introduce two soft constraints over tree hierarchy and source coverage.
We find that our models outperform vanilla Neural QCFG in most settings.
arXiv Detail & Related papers (2023-06-05T08:05:05Z) - Z-Code++: A Pre-trained Language Model Optimized for Abstractive
Summarization [108.09419317477986]
Z-Code++ is a new pre-trained language model optimized for abstractive text summarization.
The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation.
Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum.
arXiv Detail & Related papers (2022-08-21T01:00:54Z) - Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via
Intent Conditioning [11.307865386100993]
We propose a novel NAR semantic that introduces intent conditioning on the decoder.
As the top-level intent governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search.
We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2.
arXiv Detail & Related papers (2022-04-14T04:06:39Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
Subword Units [19.668440671541546]
In end-to-end automatic speech recognition, a model is expected to implicitly learn representations suitable for recognizing a word-level sequence.
We propose a hierarchical conditional model that is based on connectionist temporal classification ( CTC)
Experimental results on LibriSpeech-100h, 960h and TEDLIUM2 demonstrate that the proposed model improves over a standard CTC-based model.
arXiv Detail & Related papers (2021-10-08T13:15:58Z) - Less Is More: Improved RNN-T Decoding Using Limited Label Context and
Path Merging [43.388004364072174]
We study the influence of the amount of label context on the model's accuracy, and its impact on the efficiency of the decoding process.
We find that we can limit the context of the recurrent neural network transducer (RNN-T) during training to just four previous word-piece labels, without degrading word error rate (WER) relative to the full-context baseline.
arXiv Detail & Related papers (2020-12-12T07:39:21Z) - Hierarchical Attention Transformer Architecture For Syntactic Spell
Correction [1.0312968200748118]
We propose multi encoder-single decoder variation of conventional transformer.
We report significant improvement of 0.11%, 0.32% and 0.69% in character (CER), word (WER) and sentence (SER) error rates.
Our architecture is also trains 7.8 times faster, and is only about 1/3 in size from the next most accurate model.
arXiv Detail & Related papers (2020-05-11T06:19:01Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.