An Unsupervised Sampling Approach for Image-Sentence Matching Using
Document-Level Structural Information
- URL: http://arxiv.org/abs/2104.02605v1
- Date: Sun, 21 Mar 2021 05:43:29 GMT
- Title: An Unsupervised Sampling Approach for Image-Sentence Matching Using
Document-Level Structural Information
- Authors: Zejun Li, Zhongyu Wei, Zhihao Fan, Haijun Shan, Xuanjing Huang
- Abstract summary: We focus on the problem of unsupervised image-sentence matching.
Existing research explores to utilize document-level structural information to sample positive and negative instances for model training.
We propose a new sampling strategy to select additional intra-document image-sentence pairs as positive or negative samples.
- Score: 64.66785523187845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on the problem of unsupervised image-sentence
matching. Existing research explores to utilize document-level structural
information to sample positive and negative instances for model training.
Although the approach achieves positive results, it introduces a sampling bias
and fails to distinguish instances with high semantic similarity. To alleviate
the bias, we propose a new sampling strategy to select additional
intra-document image-sentence pairs as positive or negative samples.
Furthermore, to recognize the complex pattern in intra-document samples, we
propose a Transformer based model to capture fine-grained features and
implicitly construct a graph for each document, where concepts in a document
are introduced to bridge the representation learning of images and sentences in
the context of a document. Experimental results show the effectiveness of our
approach to alleviate the bias and learn well-aligned multimodal
representations.
Related papers
- Sample-Specific Debiasing for Better Image-Text Models [6.301766237907306]
Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval.
One common approach involves contrasting semantically similar (positive) and dissimilar (negative) pairs of data points.
Drawing negative samples uniformly from the training data set introduces false negatives, i.e., samples that are treated as dissimilar but belong to the same class.
In healthcare data, the underlying class distribution is nonuniform, implying that false negatives occur at a highly variable rate.
arXiv Detail & Related papers (2023-04-25T22:23:41Z) - Semi-Supervised Image Captioning by Adversarially Propagating Labeled
Data [95.0476489266988]
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models.
Our proposed method trains a captioner to learn from a paired data and to progressively associate unpaired data.
Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired dataset.
arXiv Detail & Related papers (2023-01-26T15:25:43Z) - Sequence Level Contrastive Learning for Text Summarization [49.01633745943263]
We propose a contrastive learning model for supervised abstractive text summarization.
Our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.
arXiv Detail & Related papers (2021-09-08T08:00:36Z) - Detection and Captioning with Unseen Object Classes [12.894104422808242]
Test images may contain visual objects with no corresponding visual or textual training examples.
We propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model.
Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset.
arXiv Detail & Related papers (2021-08-13T10:43:20Z) - Support-set bottlenecks for video-text representation learning [131.4161071785107]
The dominant paradigm for learning video-text representations -- noise contrastive learning -- is too strict.
We propose a novel method that alleviates this by leveraging a generative model to naturally push these related samples together.
Our proposed method outperforms others by a large margin on MSR-VTT, VATEX and ActivityNet, and MSVD for video-to-text and text-to-video retrieval.
arXiv Detail & Related papers (2020-10-06T15:38:54Z) - CSI: Novelty Detection via Contrastive Learning on Distributionally
Shifted Instances [77.28192419848901]
We propose a simple, yet effective method named contrasting shifted instances (CSI)
In addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself.
Our experiments demonstrate the superiority of our method under various novelty detection scenarios.
arXiv Detail & Related papers (2020-07-16T08:32:56Z) - Multi-Image Summarization: Textual Summary from a Set of Cohesive Images [17.688344968462275]
This paper proposes the new task of multi-image summarization.
It aims to generate a concise and descriptive textual summary given a coherent set of input images.
A dense average image feature aggregation network allows the model to focus on a coherent subset of attributes.
arXiv Detail & Related papers (2020-06-15T18:45:35Z) - Self-Supervised Representation Learning on Document Images [8.927538538637783]
We show that patch-based pre-training performs poorly on document images because of their different structural properties and poor intra-sample semantic information.
We propose two context-aware alternatives to improve performance on the Tobacco-3482 image classification task.
arXiv Detail & Related papers (2020-04-18T10:14:06Z) - Informative Sample Mining Network for Multi-Domain Image-to-Image
Translation [101.01649070998532]
We show that improving the sample selection strategy is an effective solution for image-to-image translation tasks.
We propose a novel multi-stage sample training scheme to reduce sample hardness while preserving sample informativeness.
arXiv Detail & Related papers (2020-01-05T05:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.