Contributing Dimension Structure of Deep Feature for Coreset Selection
- URL: http://arxiv.org/abs/2401.16193v2
- Date: Sat, 2 Mar 2024 08:27:52 GMT
- Title: Contributing Dimension Structure of Deep Feature for Coreset Selection
- Authors: Zhijing Wan, Zhixiang Wang, Yuran Wang, Zheng Wang, Hongyuan Zhu,
Shin'ichi Satoh
- Abstract summary: Coreset selection seeks to choose a subset of crucial training samples for efficient learning.
Sample selection hinges on two main aspects: a sample's representation in enhancing performance and the role of sample diversity in averting overfitting.
Existing methods typically measure both the representation and diversity of data based on similarity metrics.
- Score: 26.759457501199822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coreset selection seeks to choose a subset of crucial training samples for
efficient learning. It has gained traction in deep learning, particularly with
the surge in training dataset sizes. Sample selection hinges on two main
aspects: a sample's representation in enhancing performance and the role of
sample diversity in averting overfitting. Existing methods typically measure
both the representation and diversity of data based on similarity metrics, such
as L2-norm. They have capably tackled representation via distribution matching
guided by the similarities of features, gradients, or other information between
data. However, the results of effectively diverse sample selection are mired in
sub-optimality. This is because the similarity metrics usually simply aggregate
dimension similarities without acknowledging disparities among the dimensions
that significantly contribute to the final similarity. As a result, they fall
short of adequately capturing diversity. To address this, we propose a
feature-based diversity constraint, compelling the chosen subset to exhibit
maximum diversity. Our key lies in the introduction of a novel Contributing
Dimension Structure (CDS) metric. Different from similarity metrics that
measure the overall similarity of high-dimensional features, our CDS metric
considers not only the reduction of redundancy in feature dimensions, but also
the difference between dimensions that contribute significantly to the final
similarity. We reveal that existing methods tend to favor samples with similar
CDS, leading to a reduced variety of CDS types within the coreset and
subsequently hindering model performance. In response, we enhance the
performance of five classical selection methods by integrating the CDS
constraint. Our experiments on three datasets demonstrate the general
effectiveness of the proposed method in boosting existing methods.
Related papers
- Unsupervised Feature Selection Algorithm Based on Dual Manifold Re-ranking [5.840228332438659]
This paper proposes an unsupervised feature selection algorithm based on dual manifold re-ranking (DMRR)
Different similarity matrices are constructed to depict the manifold structures among samples, between samples and features, and among features themselves.
By comparing DMRR with three original unsupervised feature selection algorithms and two unsupervised feature selection post-processing algorithms, experimental results confirm that the importance information of different samples and the dual relationship between sample and feature are beneficial for achieving better feature selection.
arXiv Detail & Related papers (2024-10-27T09:29:17Z) - Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble [11.542472900306745]
Multi-Comprehension (MC) Ensemble is proposed as a strategy to augment the Out-of-Distribution (OOD) feature representation field.
Our experimental results demonstrate the superior performance of the MC Ensemble strategy in OOD detection.
This underscores the effectiveness of our proposed approach in enhancing the model's capability to detect instances outside its training distribution.
arXiv Detail & Related papers (2024-03-24T18:43:04Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions [11.121652649243119]
Diffusion models have been widely adopted in data augmentation due to their outstanding diversity in data generation.
We propose a novel approach termed the detail reinforcement diffusion model(DRDM)
It leverages the rich knowledge of large models for fine-grained data augmentation and comprises two key components including discriminative semantic recombination (DSR) and spatial knowledge reference(SKR)
arXiv Detail & Related papers (2023-09-15T01:28:59Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST)
Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness.
The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - A Novel Intrinsic Measure of Data Separability [0.0]
In machine learning, the performance of a classifier depends on the separability/complexity of datasets.
We create an intrinsic measure -- the Distance-based Separability Index (DSI)
We show that the DSI can indicate whether the distributions of datasets are identical for any dimensionality.
arXiv Detail & Related papers (2021-09-11T04:20:08Z) - Spherical Feature Transform for Deep Metric Learning [58.35971328774927]
This work proposes a novel spherical feature transform approach.
It relaxes the assumption of identical covariance between classes to an assumption of similar covariances of different classes on a hypersphere.
We provide a simple and effective training method, and in depth analysis on the relation between the two different transforms.
arXiv Detail & Related papers (2020-08-04T11:32:23Z) - K-Shot Contrastive Learning of Visual Features with Multiple Instance
Augmentations [67.46036826589467]
$K$-Shot Contrastive Learning is proposed to investigate sample variations within individual instances.
It aims to combine the advantages of inter-instance discrimination by learning discriminative features to distinguish between different instances.
Experiment results demonstrate the proposed $K$-shot contrastive learning achieves superior performances to the state-of-the-art unsupervised methods.
arXiv Detail & Related papers (2020-07-27T04:56:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.