Fast Redescription Mining Using Locality-Sensitive Hashing
- URL: http://arxiv.org/abs/2406.04148v1
- Date: Thu, 6 Jun 2024 15:13:48 GMT
- Title: Fast Redescription Mining Using Locality-Sensitive Hashing
- Authors: Maiju Karjalainen, Esther Galbrun, Pauli Miettinen,
- Abstract summary: We present new algorithms that perform the matching and extension orders of magnitude faster than the existing approaches.
Our algorithms are based on locality-sensitive hashing with a tailored approach to handle the discretisation of numerical attributes.
- Score: 1.126524823245055
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Redescription mining is a data analysis technique that has found applications in diverse fields. The most used redescription mining approaches involve two phases: finding matching pairs among data attributes and extending the pairs. This process is relatively efficient when the number of attributes remains limited and when the attributes are Boolean, but becomes almost intractable when the data consist of many numerical attributes. In this paper, we present new algorithms that perform the matching and extension orders of magnitude faster than the existing approaches. Our algorithms are based on locality-sensitive hashing with a tailored approach to handle the discretisation of numerical attributes as used in redescription mining.
Related papers
- Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval [24.8065557159198]
We propose an Attributes Grouping and Mining Hashing (AGMH) for fine-grained image retrieval.
AGMH groups and embeds the category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation.
AGMH consistently yields the best performance against state-of-the-art methods on fine-grained benchmark datasets.
arXiv Detail & Related papers (2023-11-10T14:01:56Z) - ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate
Nearest Neighbor Search Algorithms [5.478671305092084]
We introduce ParlayANN, a library of deterministic and parallel graph-based approximate nearest neighbor search algorithms.
We develop novel parallel implementations for four state-of-the-art graph-based ANNS algorithms that scale to billion-scale datasets.
arXiv Detail & Related papers (2023-05-07T19:28:23Z) - Unified Functional Hashing in Automatic Machine Learning [58.77232199682271]
We show that large efficiency gains can be obtained by employing a fast unified functional hash.
Our hash is "functional" in that it identifies equivalent candidates even if they were represented or coded differently.
We show dramatic improvements on multiple AutoML domains, including neural architecture search and algorithm discovery.
arXiv Detail & Related papers (2023-02-10T18:50:37Z) - Representation Learning for Efficient and Effective Similarity Search
and Recommendation [6.280255585012339]
This thesis makes contributions to representation learning that improve effectiveness of hash codes through more expressive representations and a more effective similarity measure.
The contributions are empirically validated on several tasks related to similarity search and recommendation.
arXiv Detail & Related papers (2021-09-04T08:19:01Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - A Systematic Characterization of Sampling Algorithms for Open-ended
Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models.
We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation.
We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z) - Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code.
In this study, we propose a novel reinforcing short-length hashing (RSLH)
In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information.
Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z) - A Novel Incremental Cross-Modal Hashing Approach [21.99741793652628]
We propose a novel incremental cross-modal hashing algorithm termed "iCMH"
The proposed approach consists of two sequential stages, namely, learning the hash codes and training the hash functions.
Experiments across a variety of cross-modal datasets and comparisons with state-of-the-art cross-modal algorithms shows the usefulness of our approach.
arXiv Detail & Related papers (2020-02-03T12:34:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.