Bi-objective Optimization of Biclustering with Binary Data
- URL: http://arxiv.org/abs/2002.04711v1
- Date: Sun, 9 Feb 2020 21:49:26 GMT
- Title: Bi-objective Optimization of Biclustering with Binary Data
- Authors: Fred Glover, Said Hanafi, and Gintaras Palubeckis
- Abstract summary: Clustering consists of partitioning data objects into subsets called clusters according to some similarity criteria.
This paper addresses a quasi-clustering that allows overlapping of clusters, and which we link to biclustering.
Biclustering simultaneously groups the objects and features so that a specific group of objects has a special group of features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering consists of partitioning data objects into subsets called clusters
according to some similarity criteria. This paper addresses a generalization
called quasi-clustering that allows overlapping of clusters, and which we link
to biclustering. Biclustering simultaneously groups the objects and features so
that a specific group of objects has a special group of features. In recent
years, biclustering has received a lot of attention in several practical
applications. In this paper we consider a bi-objective optimization of
biclustering problem with binary data. First we present an integer programing
formulations for the bi-objective optimization biclustering. Next we propose a
constructive heuristic based on the set intersection operation and its
efficient implementation for solving a series of mono-objective problems used
inside the Epsilon-constraint method (obtained by keeping only one objective
function and the other objective function is integrated into constraints).
Finally, our experimental results show that using CPLEX solver as an exact
algorithm for finding an optimal solution drastically increases the
computational cost for large instances, while our proposed heuristic provides
very good results and significantly reduces the computational expense.
Related papers
- Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Clustering with Penalty for Joint Occurrence of Objects: Computational
Aspects [0.0]
The method of Hol'y, Sokol and vCern'y clusters objects based on their incidence in a large number of given sets.
The idea is to minimize the occurrence of multiple objects from the same cluster in the same set.
In the current paper, we study computational aspects of the method.
arXiv Detail & Related papers (2021-02-02T10:39:27Z) - Clustering Ensemble Meets Low-rank Tensor Approximation [50.21581880045667]
This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one.
We propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective.
Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods.
arXiv Detail & Related papers (2020-12-16T13:01:37Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Biclustering with Alternating K-Means [5.089110111757978]
We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk.
We propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows.
The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.
arXiv Detail & Related papers (2020-09-09T20:15:24Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis [32.15852903039789]
The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix.
We develop an efficient iterative algorithm which we call the NEO-CC algorithm.
Experimental results show that the NEO-CC algorithm is able to effectively capture the underlying co-clustering structure of real-world data.
arXiv Detail & Related papers (2020-04-24T04:39:14Z) - Point-Set Kernel Clustering [11.093960688450602]
This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a set of objects.
We show that the new clustering procedure is both effective and efficient that enables it to deal with large scale datasets.
arXiv Detail & Related papers (2020-02-14T00:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.