Towards High-Performance Exploratory Data Analysis (EDA) Via Stable
Equilibrium Point
- URL: http://arxiv.org/abs/2306.04425v1
- Date: Wed, 7 Jun 2023 13:31:57 GMT
- Title: Towards High-Performance Exploratory Data Analysis (EDA) Via Stable
Equilibrium Point
- Authors: Yuxuan Song, Yongyu Wang
- Abstract summary: We introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA.
A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets.
- Score: 5.825190876052149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploratory data analysis (EDA) is a vital procedure for data science
projects. In this work, we introduce a stable equilibrium point (SEP) - based
framework for improving the efficiency and solution quality of EDA. By
exploiting the SEPs to be the representative points, our approach aims to
generate high-quality clustering and data visualization for large-scale data
sets. A very unique property of the proposed method is that the SEPs will
directly encode the clustering properties of data sets. Compared with prior
state-of-the-art clustering and data visualization methods, the proposed
methods allow substantially improving computing efficiency and solution quality
for large-scale data analysis tasks.
Related papers
- Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection.
By leveraging influence scores, we effectively identify the most impactful data for system improvement.
We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z) - Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models [79.65071553905021]
We propose Data Advisor, a method for generating data that takes into account the characteristics of the desired dataset.
Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation.
arXiv Detail & Related papers (2024-10-07T17:59:58Z) - Targeted synthetic data generation for tabular data via hardness characterization [0.0]
We introduce a simple augmentation pipeline that generates only high-value training points based on hardness characterization.
Our approach improves the quality of out-of-sample predictions and it is computationally more efficient compared to non-targeted methods.
arXiv Detail & Related papers (2024-10-01T14:54:26Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm [2.0232038310495435]
We present a clustering algorithm that is highly sensitive to the initial selection and robustness of datasets.
Extensive experiments are conducted on four simulated datasets six real datasets.
Results demonstrate that our algorithm improves the accuracy of various algorithms by an average of 63.8%.
arXiv Detail & Related papers (2023-07-25T16:37:09Z) - Towards Efficient Deep Hashing Retrieval: Condensing Your Data via
Feature-Embedding Matching [7.908244841289913]
The expenses involved in training state-of-the-art deep hashing retrieval models have witnessed an increase.
The state-of-the-art dataset distillation methods can not expand to all deep hashing retrieval methods.
We propose an efficient condensation framework that addresses these limitations by matching the feature-embedding between synthetic set and real set.
arXiv Detail & Related papers (2023-05-29T13:23:55Z) - Adaptive Weighted Multiview Kernel Matrix Factorization with its
application in Alzheimer's Disease Analysis -- A clustering Perspective [3.3843930118195407]
We propose a novel model to leverage data from all different modalities/views, which can learn the weights of each view adaptively.
Experimental results on ADNI dataset demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-03-07T16:05:24Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Another Use of SMOTE for Interpretable Data Collaboration Analysis [8.143750358586072]
Data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions.
This study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage.
arXiv Detail & Related papers (2022-08-26T06:39:13Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.