Towards High-Performance Exploratory Data Analysis (EDA) Via Stable
Equilibrium Point
- URL: http://arxiv.org/abs/2306.04425v1
- Date: Wed, 7 Jun 2023 13:31:57 GMT
- Title: Towards High-Performance Exploratory Data Analysis (EDA) Via Stable
Equilibrium Point
- Authors: Yuxuan Song, Yongyu Wang
- Abstract summary: We introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA.
A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets.
- Score: 5.825190876052149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploratory data analysis (EDA) is a vital procedure for data science
projects. In this work, we introduce a stable equilibrium point (SEP) - based
framework for improving the efficiency and solution quality of EDA. By
exploiting the SEPs to be the representative points, our approach aims to
generate high-quality clustering and data visualization for large-scale data
sets. A very unique property of the proposed method is that the SEPs will
directly encode the clustering properties of data sets. Compared with prior
state-of-the-art clustering and data visualization methods, the proposed
methods allow substantially improving computing efficiency and solution quality
for large-scale data analysis tasks.
Related papers
- Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models [79.65071553905021]
We propose Data Advisor, a method for generating data that takes into account the characteristics of the desired dataset.
Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation.
arXiv Detail & Related papers (2024-10-07T17:59:58Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Data-Centric Long-Tailed Image Recognition [49.90107582624604]
Long-tail models exhibit a strong demand for high-quality data.
Data-centric approaches aim to enhance both the quantity and quality of data to improve model performance.
There is currently a lack of research into the underlying mechanisms explaining the effectiveness of information augmentation.
arXiv Detail & Related papers (2023-11-03T06:34:37Z) - A Comparative Evaluation of FedAvg and Per-FedAvg Algorithms for
Dirichlet Distributed Heterogeneous Data [2.5507252967536522]
We investigate Federated Learning (FL), a paradigm of machine learning that allows for decentralized model training on devices without sharing raw data.
We compare two strategies within this paradigm: Federated Averaging (FedAvg) and Personalized Federated Averaging (Per-FedAvg)
Our results provide insights into the development of more effective and efficient machine learning strategies in a decentralized setting.
arXiv Detail & Related papers (2023-09-03T21:33:15Z) - DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm [2.0232038310495435]
We present a clustering algorithm that is highly sensitive to the initial selection and robustness of datasets.
Extensive experiments are conducted on four simulated datasets six real datasets.
Results demonstrate that our algorithm improves the accuracy of various algorithms by an average of 63.8%.
arXiv Detail & Related papers (2023-07-25T16:37:09Z) - Towards Efficient Deep Hashing Retrieval: Condensing Your Data via
Feature-Embedding Matching [7.908244841289913]
The expenses involved in training state-of-the-art deep hashing retrieval models have witnessed an increase.
The state-of-the-art dataset distillation methods can not expand to all deep hashing retrieval methods.
We propose an efficient condensation framework that addresses these limitations by matching the feature-embedding between synthetic set and real set.
arXiv Detail & Related papers (2023-05-29T13:23:55Z) - Adaptive Weighted Multiview Kernel Matrix Factorization with its
application in Alzheimer's Disease Analysis -- A clustering Perspective [3.3843930118195407]
We propose a novel model to leverage data from all different modalities/views, which can learn the weights of each view adaptively.
Experimental results on ADNI dataset demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-03-07T16:05:24Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Another Use of SMOTE for Interpretable Data Collaboration Analysis [8.143750358586072]
Data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions.
This study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage.
arXiv Detail & Related papers (2022-08-26T06:39:13Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.