A Multi-criteria Approach for Fast and Outlier-aware Representative
Selection from Manifolds
- URL: http://arxiv.org/abs/2003.05989v1
- Date: Thu, 12 Mar 2020 19:31:10 GMT
- Title: A Multi-criteria Approach for Fast and Outlier-aware Representative
Selection from Manifolds
- Authors: Mahlagha Sedghi, George Atia, Michael Georgiopoulos
- Abstract summary: MOSAIC is a novel representative selection approach from high-dimensional data that may exhibit non-linear structures.
Our method advances a multi-criteria selection approach that maximizes the global representation power of the sampled subset.
MOSAIC's superiority in achieving the desired characteristics of a representative subset all at once is demonstrated.
- Score: 1.5469452301122175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of representative selection amounts to sampling few informative
exemplars from large datasets. This paper presents MOSAIC, a novel
representative selection approach from high-dimensional data that may exhibit
non-linear structures. Resting upon a novel quadratic formulation, Our method
advances a multi-criteria selection approach that maximizes the global
representation power of the sampled subset, ensures diversity, and rejects
disruptive information by effectively detecting outliers. Through theoretical
analyses we characterize the obtained sketch and reveal that the sampled
representatives maximize a well-defined notion of data coverage in a
transformed space. In addition, we present a highly scalable randomized
implementation of the proposed algorithm shown to bring about substantial
speedups. MOSAIC's superiority in achieving the desired characteristics of a
representative subset all at once while exhibiting remarkable robustness to
various outlier types is demonstrated via extensive experiments conducted on
both real and synthetic data with comparisons to state-of-the-art algorithms.
Related papers
- Unified Bayesian representation for high-dimensional multi-modal biomedical data for small-sample classification [0.8890696402391598]
BALDUR is a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings.
This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.
arXiv Detail & Related papers (2024-11-11T14:51:24Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Fast Empirical Scenarios [0.0]
We seek to extract representative scenarios from large panel data consistent with sample moments.
Among two novel algorithms, the first identifies scenarios that have not been observed before.
The second proposal selects important data points from states of the world that have already realized.
arXiv Detail & Related papers (2023-07-08T07:58:53Z) - Adversarial Lagrangian Integrated Contrastive Embedding for Limited Size
Datasets [8.926248371832852]
This study presents a novel adversarial Lagrangian integrated contrastive embedding (ALICE) method for small-sized datasets.
The accuracy improvement and training convergence of the proposed pre-trained adversarial transfer are shown.
A novel adversarial integrated contrastive model using various augmentation techniques is investigated.
arXiv Detail & Related papers (2022-10-06T23:59:28Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Informative Sample Mining Network for Multi-Domain Image-to-Image
Translation [101.01649070998532]
We show that improving the sample selection strategy is an effective solution for image-to-image translation tasks.
We propose a novel multi-stage sample training scheme to reduce sample hardness while preserving sample informativeness.
arXiv Detail & Related papers (2020-01-05T05:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.