Related papers: Information Extraction from Swedish Medical Prescriptions with Sig-Transformer Encoder

Information Extraction from Swedish Medical Prescriptions with Sig-Transformer Encoder

URL: http://arxiv.org/abs/2010.04897v1
Date: Sat, 10 Oct 2020 04:22:07 GMT
Title: Information Extraction from Swedish Medical Prescriptions with Sig-Transformer Encoder
Authors: John Pougue Biyong, Bo Wang, Terry Lyons and Alejo J Nevado-Holgado
Abstract summary: We present a novel extension to the Transformer architecture, by incorporating signature transform with the self-attention model. Experiments on a new Swedish prescription data show the proposed architecture to be superior in two of the three information extraction tasks.
Score: 3.7921111379825088
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Relying on large pretrained language models such as Bidirectional Encoder Representations from Transformers (BERT) for encoding and adding a simple prediction layer has led to impressive performance in many clinical natural language processing (NLP) tasks. In this work, we present a novel extension to the Transformer architecture, by incorporating signature transform with the self-attention model. This architecture is added between embedding and prediction layers. Experiments on a new Swedish prescription data show the proposed architecture to be superior in two of the three information extraction tasks, comparing to baseline models. Finally, we evaluate two different embedding approaches between applying Multilingual BERT and translating the Swedish text to English then encode with a BERT model pretrained on clinical notes.

Related papers

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language. Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method. We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z)
Dual-Alignment Pre-training for Cross-lingual Sentence Embedding [79.98111074307657]
We propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding. We introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart. Our approach can significantly improve sentence embedding.
arXiv Detail & Related papers (2023-05-16T03:53:30Z)
Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling [2.741266294612776]
We introduce a new annotated corpus of Spanish newswire rich in unassimilated lexical borrowings. We use it to evaluate how several sequence labeling models (CRF, BiLSTM-CRF, and Transformer-based models) perform.
arXiv Detail & Related papers (2022-03-30T09:46:51Z)
Examining Scaling and Transfer of Language Model Architectures for Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z)
Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model. We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder. We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z)
Time-Stamped Language Model: Teaching Language Models to Understand the Flow of Events [8.655294504286635]
We propose to formulate this task as a question answering problem. This enables us to use pre-trained language models on other QA benchmarks by adapting those to the procedural text understanding. Our model evaluated on the Propara dataset shows improvements on the published state-of-the-art results with a $3.1%$ increase in F1 score.
arXiv Detail & Related papers (2021-04-15T17:50:41Z)
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST) Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
Efficient Wait-k Models for Simultaneous Machine Translation [46.01342928010307]
Simultaneous machine translation consists in starting output generation before the entire input sequence is available. Wait-k decoders offer a simple but efficient approach for this problem. We investigate the behavior of wait-k decoding in low resource settings for spoken corpora using IWSLT datasets.
arXiv Detail & Related papers (2020-05-18T11:14:23Z)
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling [4.525267347429154]
We train a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset.
arXiv Detail & Related papers (2020-03-29T14:00:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.