Blind Dates: Examining the Expression of Temporality in Historical
Photographs
- URL: http://arxiv.org/abs/2310.06633v1
- Date: Tue, 10 Oct 2023 13:51:24 GMT
- Title: Blind Dates: Examining the Expression of Temporality in Historical
Photographs
- Authors: Alexandra Barancov\'a, Melvin Wevers, Nanne van Noord
- Abstract summary: We investigate the dating of images using OpenCLIP, an open-source implementation of CLIP, a multi-modal language and vision model.
We use the textitDe Boer Scene Detection dataset, containing 39,866 gray-scale historical press photographs from 1950 to 1999.
Our analysis reveals that images featuring buses, cars, cats, dogs, and people are more accurately dated, suggesting the presence of temporal markers.
- Score: 57.07335632641355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper explores the capacity of computer vision models to discern
temporal information in visual content, focusing specifically on historical
photographs. We investigate the dating of images using OpenCLIP, an open-source
implementation of CLIP, a multi-modal language and vision model. Our experiment
consists of three steps: zero-shot classification, fine-tuning, and analysis of
visual content. We use the \textit{De Boer Scene Detection} dataset, containing
39,866 gray-scale historical press photographs from 1950 to 1999. The results
show that zero-shot classification is relatively ineffective for image dating,
with a bias towards predicting dates in the past. Fine-tuning OpenCLIP with a
logistic classifier improves performance and eliminates the bias. Additionally,
our analysis reveals that images featuring buses, cars, cats, dogs, and people
are more accurately dated, suggesting the presence of temporal markers. The
study highlights the potential of machine learning models like OpenCLIP in
dating images and emphasizes the importance of fine-tuning for accurate
temporal analysis. Future research should explore the application of these
findings to color photographs and diverse datasets.
Related papers
- Identifying Implicit Social Biases in Vision-Language Models [34.53206726136747]
We conduct a systematic analysis of the social biases that are present in vision-language models.
We find that CLIP frequently displays undesirable associations between harmful words and specific demographic groups.
Our findings highlight the importance of evaluating and addressing bias in vision-language models.
arXiv Detail & Related papers (2024-11-01T19:41:28Z) - Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI [17.101569078791492]
We study 43 CLIP vision-language models to determine whether they learn human-like facial impression biases.
We show for the first time that the the degree to which a bias is shared across a society predicts the degree to which it is reflected in a CLIP model.
arXiv Detail & Related papers (2024-08-04T08:26:58Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP)
We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - There is a Time and Place for Reasoning Beyond the Image [63.96498435923328]
Images often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.
We introduce TARA: a dataset with 16k images with their associated news, time and location automatically extracted from New York Times (NYT), and an additional 61k examples as distant supervision from WIT.
We show that there exists a 70% gap between a state-of-the-art joint model and human performance, which is slightly filled by our proposed model that uses segment-wise reasoning, motivating higher-level vision-language joint models that
arXiv Detail & Related papers (2022-03-01T21:52:08Z) - Museum Painting Retrieval [0.0]
We implement a query by example retrieval system for finding paintings in a museum image collection using classic computer vision techniques.
We study the performance of the color, texture, text and feature descriptors in datasets with different perturbations in the images.
arXiv Detail & Related papers (2021-05-11T09:28:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.