AWTE-BERT:Attending to Wordpiece Tokenization Explicitly on BERT for
Joint Intent Classification and SlotFilling
- URL: http://arxiv.org/abs/2211.14829v2
- Date: Tue, 29 Nov 2022 07:59:12 GMT
- Title: AWTE-BERT:Attending to Wordpiece Tokenization Explicitly on BERT for
Joint Intent Classification and SlotFilling
- Authors: Yu Guo, Zhilong Xie, Xingyan Chen, Leilei Wang, Yu Zhao and Gang Wu
- Abstract summary: BERT (Bidirectional Representations from Transformers) achieves the joint optimization of the two tasks.
We propose a novel joint model based on BERT, which explicitly models the multiple sub-tokens features after wordpiece tokenization.
Experimental results demonstrate that our proposed model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy.
- Score: 5.684659127683238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intent classification and slot filling are two core tasks in natural language
understanding (NLU). The interaction nature of the two tasks makes the joint
models often outperform the single designs. One of the promising solutions,
called BERT (Bidirectional Encoder Representations from Transformers), achieves
the joint optimization of the two tasks. BERT adopts the wordpiece to tokenize
each input token into multiple sub-tokens, which causes a mismatch between the
tokens and the labels lengths. Previous methods utilize the hidden states
corresponding to the first sub-token as input to the classifier, which limits
performance improvement since some hidden semantic informations is discarded in
the fine-tune process. To address this issue, we propose a novel joint model
based on BERT, which explicitly models the multiple sub-tokens features after
wordpiece tokenization, thereby generating the context features that contribute
to slot filling. Specifically, we encode the hidden states corresponding to
multiple sub-tokens into a context vector via the attention mechanism. Then, we
feed each context vector into the slot filling encoder, which preserves the
integrity of the sentence. Experimental results demonstrate that our proposed
model achieves significant improvement on intent classification accuracy, slot
filling F1, and sentence-level semantic frame accuracy on two public benchmark
datasets. The F1 score of the slot filling in particular has been improved from
96.1 to 98.2 (2.1% absolute) on the ATIS dataset.
Related papers
- Empowering Character-level Text Infilling by Eliminating Sub-Tokens [34.37743927032878]
FIM-SE stands for Fill-In-the-Middle with both Starting and Ending character constraints.
We introduce FIM-SE, which stands for Fill-In-the-Middle with both Starting and Ending character constraints.
arXiv Detail & Related papers (2024-05-27T12:21:48Z) - SEP: Self-Enhanced Prompt Tuning for Visual-Language Model [68.68025991850115]
We introduce a novel approach named Self-Enhanced Prompt Tuning (SEP)
SEP explicitly incorporates discriminative prior knowledge to enhance both textual-level and visual-level embeddings.
Comprehensive evaluations across various benchmarks and tasks confirm SEP's efficacy in prompt tuning.
arXiv Detail & Related papers (2024-05-24T13:35:56Z) - Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction.
The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z) - RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training
Retrieval-Oriented Language Models [3.4523793651427113]
We propose duplex masked auto-encoder, a.k.a. DupMAE, which targets on improving the semantic representation capacity for contextualized embeddings of both [] and ordinary tokens.
DupMAE is simple but empirically competitive: with a small decoding cost, it substantially contributes to the model's representation capability and transferability.
arXiv Detail & Related papers (2022-11-16T08:57:55Z) - Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document.
Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy.
We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z) - Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in
End-to-End Speech-to-Intent Systems [31.18865184576272]
This work is a step towards doing the same in a much more efficient and fine-grained manner where we align speech embeddings and BERT embeddings on a token-by-token basis.
We introduce a simple yet novel technique that uses a cross-modal attention mechanism to extract token-level contextual embeddings from a speech encoder.
Fine-tuning such a pretrained model to perform intent recognition using speech directly yields state-of-the-art performance on two widely used SLU datasets.
arXiv Detail & Related papers (2022-04-11T15:24:25Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Offensive Language Detection with BERT-based models, By Customizing
Attention Probabilities [0.0]
We suggest a methodology to enhance the performance of the BERT-based models on the Offensive Language Detection' task.
We customize attention probabilities by changing the Attention Mask' input to create more efficacious word embeddings.
The most improvement was 2% and 10% for English and Persian languages, respectively.
arXiv Detail & Related papers (2021-10-11T10:23:44Z) - Fast End-to-End Speech Recognition via a Non-Autoregressive Model and
Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once)
The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Stacked DeBERT: All Attention in Incomplete Data for Text Classification [8.900866276512364]
We propose Stacked DeBERT, short for Stacked Denoising Bidirectional Representations from Transformers.
Our model shows improved F1-scores and better robustness in informal/incorrect texts present in tweets and in texts with Speech-to-Text error in sentiment and intent classification tasks.
arXiv Detail & Related papers (2020-01-01T04:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.