The Role of Data Curation in Image Captioning
- URL: http://arxiv.org/abs/2305.03610v2
- Date: Fri, 2 Feb 2024 15:46:42 GMT
- Title: The Role of Data Curation in Image Captioning
- Authors: Wenyan Li, Jonas F. Lotz, Chen Qiu, Desmond Elliott
- Abstract summary: This paper contributes to this direction by actively curating difficult samples in datasets without increasing the total number of samples.
Experiments on the Flickr30K and COCO datasets with the BLIP and BEiT-3 models demonstrate that these curation methods do indeed yield improved image captioning models.
- Score: 26.61662352061468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image captioning models are typically trained by treating all samples
equally, neglecting to account for mismatched or otherwise difficult data
points. In contrast, recent work has shown the effectiveness of training models
by scheduling the data using curriculum learning strategies. This paper
contributes to this direction by actively curating difficult samples in
datasets without increasing the total number of samples. We explore the effect
of using three data curation methods within the training process: complete
removal of an sample, caption replacement, or image replacement via a
text-to-image generation model. Experiments on the Flickr30K and COCO datasets
with the BLIP and BEiT-3 models demonstrate that these curation methods do
indeed yield improved image captioning models, underscoring their efficacy.
Related papers
- CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion [58.64822817224639]
Diffusion models have a tendency to exactly replicate their training data, especially when trained on small datasets.
We present CPSample, a method that modifies the sampling process to prevent training data replication while preserving image quality.
CPSample achieves FID scores of 4.97 and 2.97 on CIFAR-10 and CelebA-64, respectively, without producing exact replicates of the training data.
arXiv Detail & Related papers (2024-09-11T05:42:01Z) - DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution.
DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model.
We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z) - Data Attribution for Text-to-Image Models by Unlearning Synthesized Images [71.23012718682634]
The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image.
We propose an efficient data attribution method by simulating unlearning the synthesized image.
We then identify training images with significant loss deviations after the unlearning process and label these as influential.
arXiv Detail & Related papers (2024-06-13T17:59:44Z) - EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function.
We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation.
The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z) - The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - T-ADAF: Adaptive Data Augmentation Framework for Image Classification
Network based on Tensor T-product Operator [0.0]
This paper proposes an Adaptive Data Augmentation Framework based on the tensor T-product Operator.
It triples one image data to be trained and gain the result from all these three images together with only less than 0.1% increase in the number of parameters.
Numerical experiments show that our data augmentation framework can improve the performance of original neural network model by 2%.
arXiv Detail & Related papers (2023-06-07T08:30:44Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Semi-Supervised Image Captioning by Adversarially Propagating Labeled
Data [95.0476489266988]
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models.
Our proposed method trains a captioner to learn from a paired data and to progressively associate unpaired data.
Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired dataset.
arXiv Detail & Related papers (2023-01-26T15:25:43Z) - Denoising Diffusion Probabilistic Models for Generation of Realistic
Fully-Annotated Microscopy Image Data Sets [1.07539359851877]
In this study, we demonstrate that diffusion models can effectively generate fully-annotated microscopy image data sets.
The proposed pipeline helps to reduce the reliance on manual annotations when training deep learning-based segmentation approaches.
arXiv Detail & Related papers (2023-01-02T14:17:08Z) - Self-Paced Contrastive Learning for Semi-supervisedMedical Image
Segmentation with Meta-labels [6.349708371894538]
We propose to adapt contrastive learning to work with meta-label annotations.
We use the meta-labels for pre-training the image encoder as well as to regularize a semi-supervised training.
Results on three different medical image segmentation datasets show that our approach highly boosts the performance of a model trained on a few scans.
arXiv Detail & Related papers (2021-07-29T04:30:46Z) - Unlabeled Data Guided Semi-supervised Histopathology Image Segmentation [34.45302976822067]
Semi-supervised learning (SSL) based on generative methods has been proven to be effective in utilizing diverse image characteristics.
We propose a new data guided generative method for histopathology image segmentation by leveraging the unlabeled data distributions.
Our method is evaluated on glands and nuclei datasets.
arXiv Detail & Related papers (2020-12-17T02:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.