Robust Consensus Clustering and its Applications for Advertising
Forecasting
- URL: http://arxiv.org/abs/2301.00717v1
- Date: Tue, 27 Dec 2022 21:49:04 GMT
- Title: Robust Consensus Clustering and its Applications for Advertising
Forecasting
- Authors: Deguang Kong, Miao Lu, Konstantin Shmakov and Jian Yang
- Abstract summary: We propose a novel algorithm -- robust consensus clustering that can find common ground truth among experts' opinions.
We apply the proposed method to the real-world advertising campaign segmentation and forecasting tasks.
- Score: 18.242055675730253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Consensus clustering aggregates partitions in order to find a better fit by
reconciling clustering results from different sources/executions. In practice,
there exist noise and outliers in clustering task, which, however, may
significantly degrade the performance. To address this issue, we propose a
novel algorithm -- robust consensus clustering that can find common ground
truth among experts' opinions, which tends to be minimally affected by the bias
caused by the outliers. In particular, we formalize the robust consensus
clustering problem as a constraint optimization problem, and then derive an
effective algorithm upon alternating direction method of multipliers (ADMM)
with rigorous convergence guarantee. Our method outperforms the baselines on
benchmarks. We apply the proposed method to the real-world advertising campaign
segmentation and forecasting tasks using the proposed consensus clustering
results based on the similarity computed via Kolmogorov-Smirnov Statistics. The
accurate clustering result is helpful for building the advertiser profiles so
as to perform the forecasting.
Related papers
- A Bayesian Approach to Clustering via the Proper Bayesian Bootstrap: the Bayesian Bagged Clustering (BBC) algorithm [0.0]
The paper presents a novel approach for unsupervised techniques in the field of clustering.
A new method is proposed to enhance existing literature models using the proper Bayesian bootstrap to improve results in terms of robustness and interpretability.
arXiv Detail & Related papers (2024-09-13T16:14:54Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - Cluster Purging: Efficient Outlier Detection based on Rate-Distortion
Theory [6.929025509877642]
Cluster Purging is an extension of clustering-based outlier detection.
We show that Cluster Purging improves upon outliers detected from raw clusterings.
arXiv Detail & Related papers (2023-02-22T09:32:37Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Self-Evolutionary Clustering [1.662966122370634]
Most existing deep clustering methods are based on simple distance comparison and highly dependent on the target distribution generated by a handcrafted nonlinear mapping.
A novel modular Self-Evolutionary Clustering (Self-EvoC) framework is constructed, which boosts the clustering performance by classification in a self-supervised manner.
The framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of self-supervised.
arXiv Detail & Related papers (2022-02-21T19:38:18Z) - Tight integration of neural- and clustering-based diarization through
deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework.
Speaker embeddings are optimized during training such that it better fits iGMM clustering.
Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Robust Grouped Variable Selection Using Distributionally Robust
Optimization [11.383869751239166]
We propose a Distributionally Robust Optimization (DRO) formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations.
We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator.
We show that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level.
arXiv Detail & Related papers (2020-06-10T22:32:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.