Image-guided topic modeling for interpretable privacy classification
- URL: http://arxiv.org/abs/2409.18674v1
- Date: Fri, 27 Sep 2024 12:02:28 GMT
- Title: Image-guided topic modeling for interpretable privacy classification
- Authors: Alina Elena Baia, Andrea Cavallaro,
- Abstract summary: We propose to predict image privacy based on a set of natural language content descriptors.
These content descriptors are associated with privacy scores that reflect how people perceive image content.
We use the ITM-generated descriptors to learn a privacy predictor, Priv$times$ITM, whose decisions are interpretable by design.
- Score: 27.301741710016223
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Predicting and explaining the private information contained in an image in human-understandable terms is a complex and contextual task. This task is challenging even for large language models. To facilitate the understanding of privacy decisions, we propose to predict image privacy based on a set of natural language content descriptors. These content descriptors are associated with privacy scores that reflect how people perceive image content. We generate descriptors with our novel Image-guided Topic Modeling (ITM) approach. ITM leverages, via multimodality alignment, both vision information and image textual descriptions from a vision language model. We use the ITM-generated descriptors to learn a privacy predictor, Priv$\times$ITM, whose decisions are interpretable by design. Our Priv$\times$ITM classifier outperforms the reference interpretable method by 5 percentage points in accuracy and performs comparably to the current non-interpretable state-of-the-art model.
Related papers
- Towards Retrieval-Augmented Architectures for Image Captioning [81.11529834508424]
This work presents a novel approach towards developing image captioning models that utilize an external kNN memory to improve the generation process.
Specifically, we propose two model variants that incorporate a knowledge retriever component that is based on visual similarities.
We experimentally validate our approach on COCO and nocaps datasets and demonstrate that incorporating an explicit external memory can significantly enhance the quality of captions.
arXiv Detail & Related papers (2024-05-21T18:02:07Z) - Private Attribute Inference from Images with Vision-Language Models [2.9373912230684565]
Vision-language models (VLMs) are capable of understanding both images and text.
We evaluate 7 state-of-the-art VLMs, finding that they can infer various personal attributes at up to 77.6% accuracy.
We observe that accuracy scales with the general capabilities of the models, implying that future models can be misused as stronger inferential adversaries.
arXiv Detail & Related papers (2024-04-16T14:42:49Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Human-interpretable and deep features for image privacy classification [32.253391125106674]
We discuss suitable features for image privacy classification and propose eight privacy-specific and human-interpretable features.
These features increase the performance of deep learning models and, on their own, improve the image representation for privacy classification compared with much higher dimensional deep features.
arXiv Detail & Related papers (2023-10-30T14:39:43Z) - Multilingual Conceptual Coverage in Text-to-Image Models [98.80343331645626]
"Conceptual Coverage Across Languages" (CoCo-CroLa) is a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.
For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language.
arXiv Detail & Related papers (2023-06-02T17:59:09Z) - Content-based Graph Privacy Advisor [38.733077459065704]
We present an image privacy classifier that uses scene information and object cardinality as cues for the prediction of image privacy.
Our Graph Privacy Advisor (GPA) model simplifies a state-of-the-art graph model and improves its performance.
arXiv Detail & Related papers (2022-10-20T11:12:42Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - Show, Interpret and Tell: Entity-aware Contextualised Image Captioning
in Wikipedia [10.21762162291523]
We propose the novel task of captioning Wikipedia images by integrating contextual knowledge.
Specifically, we produce models that jointly reason over Wikipedia articles, Wikimedia images and their associated descriptions.
arXiv Detail & Related papers (2022-09-21T16:14:15Z) - Visually-Augmented Language Modeling [137.36789885105642]
We propose a novel pre-training framework, named VaLM, to Visually-augment text tokens with retrieved relevant images for Language Modeling.
With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling.
We evaluate the proposed model on various multimodal commonsense reasoning tasks, which require visual information to excel.
arXiv Detail & Related papers (2022-05-20T13:41:12Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - CAPE: Context-Aware Private Embeddings for Private Language Learning [0.5156484100374058]
Context-Aware Private Embeddings (CAPE) is a novel approach which preserves privacy during training of embeddings.
CAPE applies calibrated noise through differential privacy, preserving the encoded semantic links while obscuring sensitive information.
Experimental results demonstrate that the proposed approach reduces private information leakage better than either single intervention.
arXiv Detail & Related papers (2021-08-27T14:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.