Are Deep Image Embedding Clustering Methods Effective for Heterogeneous
Tabular Data?
- URL: http://arxiv.org/abs/2212.14111v1
- Date: Wed, 28 Dec 2022 22:29:10 GMT
- Title: Are Deep Image Embedding Clustering Methods Effective for Heterogeneous
Tabular Data?
- Authors: Sakib Abrar and Manar D. Samad
- Abstract summary: This paper performs one of the first studies on deep embedding clustering of seven data sets using six state-of-the-art baseline methods proposed for image data sets.
Traditional clustering of tabular data ranks second out of eight methods and is superior to most deep embedding clustering baselines.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep learning methods in the literature are invariably benchmarked on image
data sets and then assumed to work on all data problems. Unfortunately,
architectures designed for image learning are often not ready or optimal for
non-image data without considering data-specific learning requirements. In this
paper, we take a data-centric view to argue that deep image embedding
clustering methods are not equally effective on heterogeneous tabular data
sets. This paper performs one of the first studies on deep embedding clustering
of seven tabular data sets using six state-of-the-art baseline methods proposed
for image data sets. Our results reveal that the traditional clustering of
tabular data ranks second out of eight methods and is superior to most deep
embedding clustering baselines. Our observation is in line with the recent
literature that traditional machine learning of tabular data is still a
competitive approach against deep learning. Although surprising to many deep
learning researchers, traditional clustering methods can be competitive
baselines for tabular data, and outperforming these baselines remains a
challenge for deep embedding clustering. Therefore, deep learning methods for
image learning may not be fair or suitable baselines for tabular data without
considering data-specific contrasts and learning requirements.
Related papers
- Attention versus Contrastive Learning of Tabular Data -- A Data-centric
Benchmarking [0.0]
This article extensively evaluates state-of-the-art attention and contrastive learning methods on a wide selection of 28 data sets.
We find that a hybrid attention-contrastive learning strategy mostly wins on hard-to-classify data sets.
Traditional methods are frequently superior on easy-to-classify data sets with presumably simpler decision boundaries.
arXiv Detail & Related papers (2024-01-08T22:36:05Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - Data-Free Sketch-Based Image Retrieval [56.96186184599313]
We propose Data-Free (DF)-SBIR, where pre-trained, single-modality classification models have to be leveraged to learn cross-modal metric-space for retrieval without access to any training data.
We present a methodology for DF-SBIR, which can leverage knowledge from models independently trained to perform classification on photos and sketches.
Our method also achieves mAPs competitive with data-dependent approaches, all the while requiring no training data.
arXiv Detail & Related papers (2023-03-14T10:34:07Z) - Semi-Supervised Image Captioning by Adversarially Propagating Labeled
Data [95.0476489266988]
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models.
Our proposed method trains a captioner to learn from a paired data and to progressively associate unpaired data.
Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired dataset.
arXiv Detail & Related papers (2023-01-26T15:25:43Z) - Deep Clustering of Tabular Data by Weighted Gaussian Distribution Learning [0.0]
This paper develops one of the first deep clustering methods for tabular data: Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS)
The G-CEALS method presents average rank orderings of 2.9(1.7) and 2.8(1.7) based on clustering accuracy and adjusted Rand index (ARI) scores on sixteen data sets, respectively, and outperforms nine state-of-the-art clustering methods.
arXiv Detail & Related papers (2023-01-02T18:45:53Z) - Is margin all you need? An extensive empirical study of active learning
on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark.
Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z) - Deep Neural Networks and Tabular Data: A Survey [6.940394595795544]
This work provides an overview of state-of-the-art deep learning methods for tabular data.
We start by categorizing them into three groups: data transformations, specialized architectures, and regularization models.
We then provide a comprehensive overview of the main approaches in each group.
arXiv Detail & Related papers (2021-10-05T09:22:39Z) - Tune It or Don't Use It: Benchmarking Data-Efficient Image
Classification [9.017660524497389]
We design a benchmark for data-efficient image classification consisting of six diverse datasets spanning various domains.
We re-evaluate the standard cross-entropy baseline and eight methods for data-efficient deep learning published between 2017 and 2021 at renowned venues.
tuning learning rate, weight decay, and batch size on a separate validation split results in a highly competitive baseline.
arXiv Detail & Related papers (2021-08-30T11:24:51Z) - Learning Topology from Synthetic Data for Unsupervised Depth Completion [66.26787962258346]
We present a method for inferring dense depth maps from images and sparse depth measurements.
We learn the association of sparse point clouds with dense natural shapes, and using the image as evidence to validate the predicted depth map.
arXiv Detail & Related papers (2021-06-06T00:21:12Z) - Data Consistent CT Reconstruction from Insufficient Data with Learned
Prior Images [70.13735569016752]
We investigate the robustness of deep learning in CT image reconstruction by showing false negative and false positive lesion cases.
We propose a data consistent reconstruction (DCR) method to improve their image quality, which combines the advantages of compressed sensing and deep learning.
The efficacy of the proposed method is demonstrated in cone-beam CT with truncated data, limited-angle data and sparse-view data, respectively.
arXiv Detail & Related papers (2020-05-20T13:30:49Z) - Automatically Discovering and Learning New Visual Categories with
Ranking Statistics [145.89790963544314]
We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes.
We learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data.
We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
arXiv Detail & Related papers (2020-02-13T18:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.