Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for
Change Captioning
- URL: http://arxiv.org/abs/2009.14352v1
- Date: Wed, 30 Sep 2020 00:13:49 GMT
- Title: Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for
Change Captioning
- Authors: Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, and Jianfei Cai
- Abstract summary: We propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.
We also propose a novel reinforcement learning process to fine-tune the attention directly with language evaluation rewards.
Our method outperforms the state-of-the-art approaches by a large margin in both Spot-the-Diff and CLEVR-Change datasets.
- Score: 41.044241265804125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Change Captioning is a task that aims to describe the difference between
images with natural language. Most existing methods treat this problem as a
difference judgment without the existence of distractors, such as viewpoint
changes. However, in practice, viewpoint changes happen often and can overwhelm
the semantic difference to be described. In this paper, we propose a novel
visual encoder to explicitly distinguish viewpoint changes from semantic
changes in the change captioning task. Moreover, we further simulate the
attention preference of humans and propose a novel reinforcement learning
process to fine-tune the attention directly with language evaluation rewards.
Extensive experimental results show that our method outperforms the
state-of-the-art approaches by a large margin in both Spot-the-Diff and
CLEVR-Change datasets.
Related papers
- Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning [71.14084801851381]
Change captioning aims to succinctly describe the semantic change between a pair of similar images.
Most existing methods directly capture the difference between them, which risk obtaining error-prone difference features.
We propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations.
arXiv Detail & Related papers (2024-07-16T13:00:33Z) - Context-aware Difference Distilling for Multi-change Captioning [106.72151597074098]
Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language.
We propose a novel context-aware difference distilling network to capture all genuine changes for yielding sentences.
arXiv Detail & Related papers (2024-05-31T14:07:39Z) - Neighborhood Contrastive Transformer for Change Captioning [80.10836469177185]
We propose a neighborhood contrastive transformer to improve the model's perceiving ability for various changes under different scenes.
The proposed method achieves the state-of-the-art performance on three public datasets with different change scenarios.
arXiv Detail & Related papers (2023-03-06T14:39:54Z) - Word-Level Fine-Grained Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters.
Current works still struggle with output images' quality and consistency, and rely on additional semantic information or auxiliary captioning networks.
We first introduce a new sentence representation, which incorporates word information from all story sentences to mitigate the inconsistency problem.
Then, we propose a new discriminator with fusion features to improve image quality and story consistency.
arXiv Detail & Related papers (2022-08-03T21:01:47Z) - Audio-Adaptive Activity Recognition Across Video Domains [112.46638682143065]
We leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening.
We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation.
We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically.
arXiv Detail & Related papers (2022-03-27T08:15:20Z) - Changing the Narrative Perspective: From Deictic to Anaphoric Point of
View [0.0]
We introduce the task of changing the narrative point of view, where characters are assigned a narrative perspective that is different from the one originally used by the writer.
The resulting shift in the narrative point of view alters the reading experience and can be used as a tool in fiction writing.
We describe a pipeline for processing raw text that relies on a neural architecture for mention selection.
arXiv Detail & Related papers (2021-03-06T19:03:42Z) - Detection and Description of Change in Visual Streams [20.62923173347949]
We propose a new approach to incorporating unlabeled data into training to generate natural language descriptions of change.
We also develop a framework for estimating the time of change in visual stream.
We use learned representations for change evidence and consistency of perceived change, and combine these in a regularized graph cut based change detector.
arXiv Detail & Related papers (2020-03-27T20:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.