Related papers: Unsupervised Thematic Clustering Of hadith Texts Using The Apriori Algorithm

Unsupervised Thematic Clustering Of hadith Texts Using The Apriori Algorithm

URL: http://arxiv.org/abs/2512.16694v1
Date: Thu, 18 Dec 2025 15:59:46 GMT
Title: Unsupervised Thematic Clustering Of hadith Texts Using The Apriori Algorithm
Authors: Wisnu Uriawan, Achmad Ajie Priyajie, Angga Gustian, Fikri Nur Hidayat, Sendi Ahmad Rafiudin, Muhamad Fikri Zaelani,
Abstract summary: unsupervised learning approach with the Apriori algorithm has proven effective in identifying association patterns and semantic relations in unlabeled text data.<n>Results show the existence of meaningful association patterns such as the relationship between rakaat-prayer, verse-revelation, and hadith-story.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This research stems from the urgency to automate the thematic grouping of hadith in line with the growing digitalization of Islamic texts. Based on a literature review, the unsupervised learning approach with the Apriori algorithm has proven effective in identifying association patterns and semantic relations in unlabeled text data. The dataset used is the Indonesian Translation of the hadith of Bukhari, which first goes through preprocessing stages including case folding, punctuation cleaning, tokenization, stopword removal, and stemming. Next, an association rule mining analysis was conducted using the Apriori algorithm with support, confidence, and lift parameters. The results show the existence of meaningful association patterns such as the relationship between rakaat-prayer, verse-revelation, and hadith-story, which describe the themes of worship, revelation, and hadith narration. These findings demonstrate that the Apriori algorithm has the ability to automatically uncover latent semantic relationships, while contributing to the development of digital Islamic studies and technology-based learning systems.

Related papers

A Critical Review of the Need for Knowledge-Centric Evaluation of Quranic Recitation [0.9332987715848714]
The sacred practice of Quranic recitation (Tajweed) faces significant pedagogical challenges in the modern era.<n>While digital technologies promise unprecedented access to education, automated tools for evaluation have failed to achieve widespread adoption or pedagogical efficacy.<n>This review concludes that the future of automated Quranic evaluation lies in hybrid systems that integrate deep linguistic knowledge with advanced audio analysis.
arXiv Detail & Related papers (2025-10-14T13:39:49Z)
Data Leakage and Deceptive Performance: A Critical Examination of Credit Card Fraud Detection Methodologies [3.6234531924374527]
This study critically examines the methodological rigor in credit card fraud detection research.<n>We demonstrate that even simple models can achieve deceptively impressive results when basic methodological principles are violated.
arXiv Detail & Related papers (2025-06-03T09:56:43Z)
Advanced Deep Learning Approaches for Automated Recognition of Cuneiform Symbols [0.3749861135832073]
Five distinct deep-learning models were trained on a comprehensive dataset of cuneiform characters.<n>Two models demonstrated outstanding performance and were subsequently assessed using cuneiform symbols from the Hammurabi law acquisition.<n>Each model effectively recognized the relevant Akkadian meanings of the symbols and delivered precise English translations.
arXiv Detail & Related papers (2025-05-07T12:05:23Z)
SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking [89.43370214059955]
Open-vocabulary Multiple Object Tracking (MOT) aims to generalize trackers to novel categories not in the training set. We present a unified framework that jointly considers semantics, location, and appearance priors in the early steps of association. Our method eliminates complex post-processings for fusing different cues and boosts the association performance significantly for large-scale open-vocabulary tracking.
arXiv Detail & Related papers (2024-09-17T14:36:58Z)
H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables [56.73919743039263]
This paper introduces a novel algorithm that integrates both symbolic and semantic (textual) approaches in a two-stage process to address limitations.<n>Our experiments demonstrate that H-STAR significantly outperforms state-of-the-art methods across three question-answering (QA) and fact-verification datasets.
arXiv Detail & Related papers (2024-06-29T21:24:19Z)
The Short Text Matching Model Enhanced with Knowledge via Contrastive Learning [8.350445155753167]
This paper proposes a short Text Matching model that combines contrastive learning and external knowledge. To avoid noise, we use keywords as the main semantics of the original sentence to retrieve corresponding knowledge words in the knowledge base. Our designed model achieves state-of-the-art performance on two publicly available Chinese Text Matching datasets.
arXiv Detail & Related papers (2023-04-08T03:24:05Z)
A Human Word Association based model for topic detection in social networks [1.8749305679160366]
This paper introduces a topic detection framework for social networks based on the concept of imitating the mental ability of word association. The performance of this framework is evaluated using the FA-CUP dataset, a benchmark in the field of topic detection.
arXiv Detail & Related papers (2023-01-30T17:10:34Z)
Textual Entailment Recognition with Semantic Features from Empirical Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text. In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis. We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z)
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels. Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations. Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z)
Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance [0.0]
We show that cognitive reformulation patterns that mimic user search behaviour are highlighted. We formalize the application of these patterns by considering a query conceptual representation. A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type.
arXiv Detail & Related papers (2020-04-21T14:13:33Z)
Temporal Embeddings and Transformer Models for Narrative Text Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling. The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time. A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.