Related papers: Internet Explorer: Targeted Representation Learning on the Open Web

Internet Explorer: Targeted Representation Learning on the Open Web

URL: http://arxiv.org/abs/2302.14051v2
Date: Thu, 7 Sep 2023 01:47:22 GMT
Title: Internet Explorer: Targeted Representation Learning on the Open Web
Authors: Alexander C. Li, Ellis Brown, Alexei A. Efros, Deepak Pathak
Abstract summary: Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. We propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset.
Score: 121.02587846761627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30--40 hours. Results, visualizations, and videos at https://internet-explorer-ssl.github.io/

Related papers

Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision [4.600687314645625]
Architectural backbones pre-trained on large datasets like ImageNet are commonly employed as feature extractors. Our study systematically evaluates multiple lightweight, pre-trained CNN backbones under consistent training settings. Our findings provide actionable insights into the performance trade-offs and effectiveness of different backbones.
arXiv Detail & Related papers (2024-06-09T02:01:25Z)
Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z)
A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions. In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution. Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z)
Vision Learners Meet Web Image-Text Pairs [32.36188289972377]
In this work, we consider self-supervised pre-training on noisy web sourced image-text paired data. We compare a range of methods, including single-modal ones that use masked training objectives and multi-modal ones that use image-text constrastive training. We present a new visual representation pre-training method, MUlti-modal Generator(MUG), that learns from scalable web sourced image-text data.
arXiv Detail & Related papers (2023-01-17T18:53:24Z)
FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning [82.75157675790553]
We propose a Few-shot guided Prototypical (FoPro) representation learning method. FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets.
arXiv Detail & Related papers (2022-12-01T12:39:03Z)
GROWN+UP: A Graph Representation Of a Webpage Network Utilizing Pre-training [0.2538209532048866]
We introduce an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually. We show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage boilerplate removal and genre classification.
arXiv Detail & Related papers (2022-08-03T13:37:27Z)
Towards Efficient and Data Agnostic Image Classification Training Pipeline for Embedded Systems [0.0]
This work is focusing on reviewing the latest augmentation and regularization methods for the image classification. We can achieve a reasonable performance on a variety of downstream image classification tasks without manual tuning of parameters to each particular task. Resulting models are computationally efficient and can be deployed to CPU using the OpenVINO toolkit.
arXiv Detail & Related papers (2021-08-16T12:38:05Z)
Learning Transferable Visual Models From Natural Language Supervision [13.866297967166089]
Learning directly from raw text about images is a promising alternative. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn. SOTA image representations are learned from scratch on a dataset of 400 million (image, text) pairs collected from the internet.
arXiv Detail & Related papers (2021-02-26T19:04:58Z)
Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner. We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z)
Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data [78.74367441804183]
We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain. NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client. We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets.
arXiv Detail & Related papers (2020-01-09T01:21:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.