Consistent Multiple Sequence Decoding
- URL: http://arxiv.org/abs/2004.00760v2
- Date: Wed, 15 Apr 2020 21:19:16 GMT
- Title: Consistent Multiple Sequence Decoding
- Authors: Bicheng Xu, Leonid Sigal
- Abstract summary: We introduce a consistent multiple sequence decoding architecture.
This architecture allows for consistent and simultaneous decoding of an arbitrary number of sequences.
We show the efficacy of our consistent multiple sequence decoder on the task of dense relational image captioning.
- Score: 36.46573114422263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence decoding is one of the core components of most visual-lingual
models. However, typical neural decoders when faced with decoding multiple,
possibly correlated, sequences of tokens resort to simple independent decoding
schemes. In this paper, we introduce a consistent multiple sequence decoding
architecture, which is while relatively simple, is general and allows for
consistent and simultaneous decoding of an arbitrary number of sequences. Our
formulation utilizes a consistency fusion mechanism, implemented using message
passing in a Graph Neural Network (GNN), to aggregate context from related
decoders. This context is then utilized as a secondary input, in addition to
previously generated output, to make a prediction at a given step of decoding.
Self-attention, in the GNN, is used to modulate the fusion mechanism locally at
each node and each step in the decoding process. We show the efficacy of our
consistent multiple sequence decoder on the task of dense relational image
captioning and illustrate state-of-the-art performance (+ 5.2% in mAP) on the
task. More importantly, we illustrate that the decoded sentences, for the same
regions, are more consistent (improvement of 9.5%), while across images and
regions maintain diversity.
Related papers
- A Framework for Bidirectional Decoding: Case Study in Morphological
Inflection [4.602447284133507]
We propose a framework for decoding sequences from the "outside-in"
At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences.
Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
arXiv Detail & Related papers (2023-05-21T22:08:31Z) - Learning to Compose Representations of Different Encoder Layers towards
Improving Compositional Generalization [29.32436551704417]
We propose textscCompoSition (textbfCompose textbfSyntactic and Semanttextbfic Representatextbftions)
textscCompoSition achieves competitive results on two comprehensive and realistic benchmarks.
arXiv Detail & Related papers (2023-05-20T11:16:59Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Decoder-Only or Encoder-Decoder? Interpreting Language Model as a
Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence.
Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text.
Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z) - Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers.
Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z) - UniXcoder: Unified Cross-Modal Pre-training for Code Representation [65.6846553962117]
We present UniXcoder, a unified cross-modal pre-trained model for programming language.
We propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree.
We evaluate UniXcoder on five code-related tasks over nine datasets.
arXiv Detail & Related papers (2022-03-08T04:48:07Z) - Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder [75.84152924972462]
Many real-world applications use Siamese networks to efficiently match text sequences at scale.
This paper pre-trains language models dedicated to sequence matching in Siamese architectures.
arXiv Detail & Related papers (2021-02-18T08:08:17Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.