Bypass Network for Semantics Driven Image Paragraph Captioning
- URL: http://arxiv.org/abs/2206.10059v1
- Date: Tue, 21 Jun 2022 00:48:22 GMT
- Title: Bypass Network for Semantics Driven Image Paragraph Captioning
- Authors: Qi Zheng, Chaoyue Wang, Dadong Wang
- Abstract summary: Image paragraph captioning aims to describe a given image with a sequence of coherent sentences.
Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences.
We propose a bypass network that separately models semantics and linguistic syntax of preceding sentences.
- Score: 12.743882133781602
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image paragraph captioning aims to describe a given image with a sequence of
coherent sentences. Most existing methods model the coherence through the topic
transition that dynamically infers a topic vector from preceding sentences.
However, these methods still suffer from immediate or delayed repetitions in
generated paragraphs because (i) the entanglement of syntax and semantics
distracts the topic vector from attending pertinent visual regions; (ii) there
are few constraints or rewards for learning long-range transitions. In this
paper, we propose a bypass network that separately models semantics and
linguistic syntax of preceding sentences. Specifically, the proposed model
consists of two main modules, i.e. a topic transition module and a sentence
generation module. The former takes previous semantic vectors as queries and
applies attention mechanism on regional features to acquire the next topic
vector, which reduces immediate repetition by eliminating linguistics. The
latter decodes the topic vector and the preceding syntax state to produce the
following sentence. To further reduce delayed repetition in generated
paragraphs, we devise a replacement-based reward for the REINFORCE training.
Comprehensive experiments on the widely used benchmark demonstrate the
superiority of the proposed model over the state of the art for coherence while
maintaining high accuracy.
Related papers
- Self-Adaptive Reconstruction with Contrastive Learning for Unsupervised
Sentence Embeddings [24.255946996327104]
Unsupervised sentence embeddings task aims to convert sentences to semantic vector representations.
Due to the token bias in pretrained language models, the models can not capture the fine-grained semantics in sentences.
We propose a novel Self-Adaptive Reconstruction Contrastive Sentence Embeddings framework.
arXiv Detail & Related papers (2024-02-23T07:28:31Z) - On the Robustness of Text Vectorizers [9.904746542801838]
In natural language processing, models typically contain a first embedding layer, transforming a sequence of tokens into vector representations.
While the robustness with respect to changes of continuous inputs is well-understood, the situation is less clear when considering discrete changes.
Our work formally proves that popular embedding schemes, such as concatenation, TF-IDF, and paragraph Vector (a.k.a. doc2vec), exhibit robustness in the H"older or Lipschitz sense with respect to the Hamming distance.
arXiv Detail & Related papers (2023-03-09T16:37:37Z) - Semantic Operator Prediction and Applications [0.0]
QDMR formalism in semantic parsing is implemented using sequence to sequence model with attention but uses only part of speech(POS) as a representation of words of a sentence to make the training as simple and as fast as possible.
arXiv Detail & Related papers (2023-01-01T13:20:57Z) - Context-aware Fine-tuning of Self-supervised Speech Models [56.95389222319555]
We study the use of context, i.e., surrounding segments, during fine-tuning.
We propose a new approach called context-aware fine-tuning.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks.
arXiv Detail & Related papers (2022-12-16T15:46:15Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Speech Summarization using Restricted Self-Attention [79.89680891246827]
We introduce a single model optimized end-to-end for speech summarization.
We demonstrate that the proposed model learns to directly summarize speech for the How-2 corpus of instructional videos.
arXiv Detail & Related papers (2021-10-12T18:21:23Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z) - Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model.
First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model.
Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z) - Multi-Step Inference for Reasoning Over Paragraphs [95.91527524872832]
Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives.
We present a compositional model reminiscent of neural module networks that can perform chained logical reasoning.
arXiv Detail & Related papers (2020-04-06T21:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.