Self-supervised similarity search for large scientific datasets
- URL: http://arxiv.org/abs/2110.13151v1
- Date: Mon, 25 Oct 2021 18:00:00 GMT
- Title: Self-supervised similarity search for large scientific datasets
- Authors: George Stein, Peter Harrington, Jacqueline Blaum, Tomislav Medan,
Zarija Lukic
- Abstract summary: We present the use of self-supervised learning to explore and exploit large unlabeled datasets.
We first train a self-supervised model to distil low-dimensional representations that are robust to symmetries, uncertainties, and noise in each image.
We then use the representations to construct and publicly release an interactive semantic similarity search tool.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the use of self-supervised learning to explore and exploit large
unlabeled datasets. Focusing on 42 million galaxy images from the latest data
release of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging
Surveys, we first train a self-supervised model to distil low-dimensional
representations that are robust to symmetries, uncertainties, and noise in each
image. We then use the representations to construct and publicly release an
interactive semantic similarity search tool. We demonstrate how our tool can be
used to rapidly discover rare objects given only a single example, increase the
speed of crowd-sourcing campaigns, and construct and improve training sets for
supervised applications. While we focus on images from sky surveys, the
technique is straightforward to apply to any scientific dataset of any
dimensionality. The similarity search web app can be found at
https://github.com/georgestein/galaxy_search
Related papers
- Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization [8.678089483952474]
This study introduces a novel zero-shot, sketch-based retrieval method for remote sensing images.
It employs multi-level feature extraction, self-attention-guided tokenization and filtering, and cross-modality attention update.
Our method significantly outperforms existing sketch-based remote sensing image retrieval techniques.
arXiv Detail & Related papers (2024-02-03T13:11:14Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - The Intrinsic Dimension of Images and Its Impact on Learning [60.811039723427676]
It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations.
In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning.
arXiv Detail & Related papers (2021-04-18T16:29:23Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z) - Estimating Galactic Distances From Images Using Self-supervised
Representation Learning [1.0499611180329804]
We use a contrastive self-supervised learning framework to estimate distances to galaxies from their photometric images.
We incorporate data augmentations from computer vision as well as an application-specific augmentation accounting for galactic dust.
We show that (1) pretraining on a large corpus of unlabeled data followed by fine-tuning on some labels can attain the accuracy of a fully-supervised model.
arXiv Detail & Related papers (2021-01-12T04:39:26Z) - Self-Supervised Representation Learning for Astronomical Images [1.0499611180329804]
Self-supervised learning recovers representations of sky survey images that are semantically useful.
We show that our approach can achieve the accuracy of supervised models while using 2-4 times fewer labels for training.
arXiv Detail & Related papers (2020-12-24T03:25:36Z) - Geography-Aware Self-Supervised Learning [79.4009241781968]
We show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks.
We propose novel training methods that exploit the spatially aligned structure of remote sensing data.
Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing.
arXiv Detail & Related papers (2020-11-19T17:29:13Z) - Self-supervised Learning for Astronomical Image Classification [1.2891210250935146]
In Astronomy, a huge amount of image data is generated daily by photometric surveys.
We propose a technique to leverage unlabeled astronomical images to pre-train deep convolutional neural networks.
arXiv Detail & Related papers (2020-04-23T17:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.