Jointly Optimizing State Operation Prediction and Value Generation for
Dialogue State Tracking
- URL: http://arxiv.org/abs/2010.14061v2
- Date: Thu, 8 Apr 2021 02:04:05 GMT
- Title: Jointly Optimizing State Operation Prediction and Value Generation for
Dialogue State Tracking
- Authors: Yan Zeng and Jian-Yun Nie
- Abstract summary: We investigate the problem of multi-domain Dialogue State Tracking (DST) with open vocabulary.
Existing approaches exploit BERT encoder and copy-based RNN decoder, where the encoder predicts the state operation, and the decoder generates new slot values.
We propose a purely Transformer-based framework, where a single BERT works as both the encoder and the decoder.
- Score: 23.828348485513043
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We investigate the problem of multi-domain Dialogue State Tracking (DST) with
open vocabulary. Existing approaches exploit BERT encoder and copy-based RNN
decoder, where the encoder predicts the state operation, and the decoder
generates new slot values. However, in such a stacked encoder-decoder
structure, the operation prediction objective only affects the BERT encoder and
the value generation objective mainly affects the RNN decoder. In this paper,
we propose a purely Transformer-based framework, where a single BERT works as
both the encoder and the decoder. In so doing, the operation prediction
objective and the value generation objective can jointly optimize this BERT for
DST. At the decoding step, we re-use the hidden states of the encoder in the
self-attention mechanism of the corresponding decoder layers to construct a
flat encoder-decoder architecture for effective parameter updating.
Experimental results show that our approach substantially outperforms the
existing state-of-the-art framework, and it also achieves very competitive
performance to the best ontology-based approaches.
Related papers
- $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks [53.550782959908524]
We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and decomposable tasks.
Our method, prompt-in-decoder (PiD), encodes the input once and decodes the output in parallel, boosting both training and inference efficiency.
arXiv Detail & Related papers (2024-03-19T19:27:23Z) - A blockBP decoder for the surface code [0.0]
We present a new decoder for the surface code, which combines the accuracy of the tensor-network decoders with the efficiency and parallelism of the belief-propagation algorithm.
Our decoder is therefore a belief-propagation decoder that works in the degenerate maximal likelihood decoding framework.
arXiv Detail & Related papers (2024-02-07T13:32:32Z) - BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining [0.5919433278490629]
BERT (Bidirectional Representations from Transformers) has revolutionized the field of natural language processing through its exceptional performance on numerous tasks.
DeBERTa introduced an enhanced decoder adapted for BERT's encoder model for pretraining, proving to be highly effective.
We argue that the design and research around enhanced masked language modeling decoders have been underappreciated.
arXiv Detail & Related papers (2024-01-29T03:25:11Z) - DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
Transformer Models [22.276574156358084]
We build a multi-exit encoder-decoder transformer model which is trained with deep supervision so that each of its decoder layers is capable of generating plausible predictions.
We show our approach can reduce overall inference latency by 30%-60% with comparable or even higher accuracy compared to baselines.
arXiv Detail & Related papers (2023-11-15T01:01:02Z) - NASH: A Simple Unified Framework of Structured Pruning for Accelerating
Encoder-Decoder Language Models [29.468888611690346]
We propose a simple and effective framework, NASH, that narrows the encoder and shortens the decoder networks of encoder-decoder models.
Our findings highlight two insights: (1) the number of decoder layers is the dominant factor of inference speed, and (2) low sparsity in the pruned encoder network enhances generation quality.
arXiv Detail & Related papers (2023-10-16T04:27:36Z) - Think Twice before Driving: Towards Scalable Decoders for End-to-End
Autonomous Driving [74.28510044056706]
Existing methods usually adopt the decoupled encoder-decoder paradigm.
In this work, we aim to alleviate the problem by two principles.
We first predict a coarse-grained future position and action based on the encoder features.
Then, conditioned on the position and action, the future scene is imagined to check the ramification if we drive accordingly.
arXiv Detail & Related papers (2023-05-10T15:22:02Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Adversarial Neural Networks for Error Correcting Codes [76.70040964453638]
We introduce a general framework to boost the performance and applicability of machine learning (ML) models.
We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words.
Our framework is game-theoretic, motivated by generative adversarial networks (GANs)
arXiv Detail & Related papers (2021-12-21T19:14:44Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.