Related papers: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps

URL: http://arxiv.org/abs/2205.06935v1
Date: Sat, 14 May 2022 00:26:20 GMT
Title: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps
Authors: Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng
Abstract summary: We develop DendroMap, a novel approach to exploring large-scale image datasets for machine learning. It effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests.
Score: 1.881768127321966
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning. Machine learning practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap over the compared method.

Related papers

Efficient Curation of Invertebrate Image Datasets Using Feature Embeddings and Automatic Size Comparison [5.480305055542485]
We present a method for curating large-scale image datasets of invertebrates. Our approach is based on extracting feature embeddings with pretrained deep neural networks. Also, we show that a simple area-based size comparison approach is able to find a lot of common erroneous images.
arXiv Detail & Related papers (2024-12-20T12:35:41Z)
Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision. We construct a taxonomy and review the most prominent papers in recent years. We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z)
Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate. We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data. Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z)
SepHRNet: Generating High-Resolution Crop Maps from Remote Sensing imagery using HRNet with Separable Convolution [3.717258819781834]
We propose a novel Deep learning approach that integrates HRNet with Separable Convolutional layers to capture spatial patterns and Self-attention to capture temporal patterns of the data. The proposed algorithm achieves a high classification accuracy of 97.5% and IoU of 55.2% in generating crop maps.
arXiv Detail & Related papers (2023-07-11T18:07:25Z)
CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images. CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z)
Learning Efficient Representations for Enhanced Object Detection on Large-scene SAR Images [16.602738933183865]
It is a challenging problem to detect and recognize targets on complex large-scene Synthetic Aperture Radar (SAR) images. Recently developed deep learning algorithms can automatically learn the intrinsic features of SAR images. We propose an efficient and robust deep learning based target detection method.
arXiv Detail & Related papers (2022-01-22T03:25:24Z)
Learning Hierarchical Graph Representation for Image Manipulation Detection [50.04902159383709]
The objective of image manipulation detection is to identify and locate the manipulated regions in the images. Recent approaches mostly adopt the sophisticated Convolutional Neural Networks (CNNs) to capture the tampering artifacts left in the images. We propose a hierarchical Graph Convolutional Network (HGCN-Net), which consists of two parallel branches.
arXiv Detail & Related papers (2022-01-15T01:54:25Z)
Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images. We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z)
Homography augumented momentum constrastive learning for SAR image retrieval [3.9743795764085545]
We propose a deep learning-based image retrieval approach using homography transformation augmented contrastive learning. We also propose a training method for the DNNs induced by contrastive learning that does not require any labeling procedure.
arXiv Detail & Related papers (2021-09-21T17:27:07Z)
From Heatmaps to Structural Explanations of Image Classifiers [31.44267537307587]
The paper starts with describing the explainable neural network (XNN), which attempts to extract and visualize several high-level concepts purely from the deep network. Realizing that an important missing piece is a reliable heatmap visualization tool, we have developed I-GOS and iGOS++. Through the research process, we have learned much about insights in building deep network explanations.
arXiv Detail & Related papers (2021-09-13T23:39:57Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
The Intrinsic Dimension of Images and Its Impact on Learning [60.811039723427676]
It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning.
arXiv Detail & Related papers (2021-04-18T16:29:23Z)
Sparse data to structured imageset transformation [0.0]
Machine learning problems involving sparse datasets may benefit from the use of convolutional neural networks if the numbers of samples and features are very large. We convert such datasets to imagesets while attempting to give each image structure that is amenable for use with convolutional neural networks.
arXiv Detail & Related papers (2020-05-07T20:36:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.