Protect, Show, Attend and Tell: Empowering Image Captioning Models with
Ownership Protection
- URL: http://arxiv.org/abs/2008.11009v2
- Date: Tue, 31 Aug 2021 09:36:59 GMT
- Title: Protect, Show, Attend and Tell: Empowering Image Captioning Models with
Ownership Protection
- Authors: Jian Han Lim, Chee Seng Chan, Kam Woh Ng, Lixin Fan, Qiang Yang
- Abstract summary: This paper demonstrates that the current digital watermarking framework is insufficient to protect image captioning tasks.
As a remedy, this paper studies and proposes two different embedding schemes in the hidden memory state of a recurrent neural network.
To the best of our knowledge, this work is the first to propose ownership protection on image captioning task.
- Score: 24.50702655120905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By and large, existing Intellectual Property (IP) protection on deep neural
networks typically i) focus on image classification task only, and ii) follow a
standard digital watermarking framework that was conventionally used to protect
the ownership of multimedia and video content. This paper demonstrates that the
current digital watermarking framework is insufficient to protect image
captioning tasks that are often regarded as one of the frontiers AI problems.
As a remedy, this paper studies and proposes two different embedding schemes in
the hidden memory state of a recurrent neural network to protect the image
captioning model. From empirical points, we prove that a forged key will yield
an unusable image captioning model, defeating the purpose of infringement. To
the best of our knowledge, this work is the first to propose ownership
protection on image captioning task. Also, extensive experiments show that the
proposed method does not compromise the original image captioning performance
on all common captioning metrics on Flickr30k and MS-COCO datasets, and at the
same time it is able to withstand both removal and ambiguity attacks. Code is
available at https://github.com/jianhanlim/ipr-imagecaptioning
Related papers
- The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks [5.708967043277477]
We propose a visual paraphraser that can remove watermarks from images.
The proposed visual paraphraser operates in two steps. First, it generates a caption for the given image using KOSMOS-2.
During the denoising step of the diffusion pipeline, the system generates a visually similar image that is guided by the text caption.
Our empirical findings demonstrate that visual paraphrase attacks can effectively remove watermarks from images.
arXiv Detail & Related papers (2024-08-19T22:58:30Z) - AI-Based Copyright Detection Of An Image In a Video Using Degree Of Similarity And Image Hashing [0.0]
Strategies are planned to identify the utilization of the copyrighted image in a report.
Still, we want to resolve the issue of involving a copyrighted image in a video.
Machine learning (ML) and artificial intelligence (AI) are vital to address this problem.
arXiv Detail & Related papers (2024-06-14T09:47:07Z) - Learning text-to-video retrieval from image captioning [59.81537951811595]
We describe a protocol to study text-to-video retrieval training with unlabeled videos.
We assume (i) no access to labels for any videos, and (ii) access to labeled images in the form of text.
We show that automatically labeling video frames with image captioning allows text-to-video retrieval training.
arXiv Detail & Related papers (2024-04-26T15:56:08Z) - Recoverable Privacy-Preserving Image Classification through Noise-like
Adversarial Examples [26.026171363346975]
Cloud-based image related services such as classification have become crucial.
In this study, we propose a novel privacypreserving image classification scheme.
encrypted images can be decrypted back into their original form with high fidelity (recoverable) using a secret key.
arXiv Detail & Related papers (2023-10-19T13:01:58Z) - I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models [0.0]
We present a gray-box adversarial attack on image-to-text, both untargeted and targeted.
Our attack operates in a gray-box manner, requiring no knowledge about the decoder module.
We also show that our attacks fool the popular open-source platform Hugging Face.
arXiv Detail & Related papers (2023-06-13T07:35:28Z) - Human-imperceptible, Machine-recognizable Images [76.01951148048603]
A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data.
This paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become human-imperceptible, machine-recognizable''
We show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.
arXiv Detail & Related papers (2023-06-06T13:41:37Z) - Docmarking: Real-Time Screen-Cam Robust Document Image Watermarking [97.77394585669562]
Proposed approach does not try to prevent leak in the first place but rather aims to determine source of the leak.
Method works by applying on the screen a unique identifying watermark as semi-transparent image.
Watermark image is static and stays on the screen all the time thus watermark present on every captured photograph of the screen.
arXiv Detail & Related papers (2023-04-25T09:32:11Z) - Exploring Discrete Diffusion Models for Image Captioning [104.69608826164216]
We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility.
We propose several key techniques including best-first inference, concentrated attention mask, text length prediction, and image-free training.
With 4M vision-language pre-training images and the base-sized model, we reach a CIDEr score of 125.1 on COCO.
arXiv Detail & Related papers (2022-11-21T18:12:53Z) - Controlled Caption Generation for Images Through Adversarial Attacks [85.66266989600572]
We study adversarial examples for vision and language models, which typically adopt a Convolutional Neural Network (i.e., CNN) for image feature extraction and a Recurrent Neural Network (RNN) for caption generation.
In particular, we investigate attacks on the visual encoder's hidden layer that is fed to the subsequent recurrent network.
We propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN.
arXiv Detail & Related papers (2021-07-07T07:22:41Z) - Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE)
Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.