KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification
- URL: http://arxiv.org/abs/2505.05583v1
- Date: Thu, 08 May 2025 18:27:27 GMT
- Title: KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification
- Authors: Qianbo Zang, Christophe Zgrzendek, Igor Tchappi, Afshin Khadangi, Johannes Sedlmeir,
- Abstract summary: Hierarchical Text Classification (HTC) involves assigning documents to labels organized within a taxonomy.<n>We present Knowledge Graphs for zero-shot HTC, which integrates knowledge graphs with Large Language Models (LLMs)<n>Our method retrieves relevant subgraphs from knowledge graphs related to the input text using a Retrieval-Augmented Generation (RAG) approach.
- Score: 1.335683508747434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical Text Classification (HTC) involves assigning documents to labels organized within a taxonomy. Most previous research on HTC has focused on supervised methods. However, in real-world scenarios, employing supervised HTC can be challenging due to a lack of annotated data. Moreover, HTC often faces issues with large label spaces and long-tail distributions. In this work, we present Knowledge Graphs for zero-shot Hierarchical Text Classification (KG-HTC), which aims to address these challenges of HTC in applications by integrating knowledge graphs with Large Language Models (LLMs) to provide structured semantic context during classification. Our method retrieves relevant subgraphs from knowledge graphs related to the input text using a Retrieval-Augmented Generation (RAG) approach. Our KG-HTC can enhance LLMs to understand label semantics at various hierarchy levels. We evaluate KG-HTC on three open-source HTC datasets: WoS, DBpedia, and Amazon. Our experimental results show that KG-HTC significantly outperforms three baselines in the strict zero-shot setting, particularly achieving substantial improvements at deeper levels of the hierarchy. This evaluation demonstrates the effectiveness of incorporating structured knowledge into LLMs to address HTC's challenges in large label spaces and long-tailed label distributions. Our code is available at: https://github.com/QianboZang/KG-HTC.
Related papers
- GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion [52.026016846945424]
We propose a new method called GLTW, which encodes the structural information of KGs and merges it with Large Language Models.<n>Specifically, we introduce an improved Graph Transformer (iGT) that effectively encodes subgraphs with both local and global structural information.<n>Also, we develop a subgraph-based multi-classification training objective, using all entities within KG as classification objects, to boost learning efficiency.
arXiv Detail & Related papers (2025-02-17T06:02:59Z) - Revisiting Hierarchical Text Classification: Inference and Metrics [4.057349748970303]
Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy.
Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such.
We propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method.
arXiv Detail & Related papers (2024-10-02T07:57:33Z) - Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification [34.06292178703825]
We introduce the first ICL-based framework with large language models (LLMs) for few-shot HTC.
We exploit a retrieval database to identify relevant demonstrations, and an iterative policy to manage multi-layer hierarchical labels.
We can achieve state-of-the-art results in few-shot HTC.
arXiv Detail & Related papers (2024-06-25T13:19:41Z) - Utilizing Local Hierarchy with Adversarial Training for Hierarchical Text Classification [30.353876890557984]
Hierarchical text classification (HTC) is a challenging subtask due to its complex taxonomic structure.
We propose a HiAdv framework that can fit in nearly all HTC models and optimize them with the local hierarchy as auxiliary information.
arXiv Detail & Related papers (2024-02-29T03:20:45Z) - MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D
Object Detection [59.1417156002086]
MixSup is a more practical paradigm simultaneously utilizing massive cheap coarse labels and a limited number of accurate labels for Mixed-grained Supervision.
MixSup achieves up to 97.31% of fully supervised performance, using cheap cluster annotations and only 10% box annotations.
arXiv Detail & Related papers (2024-01-29T17:05:19Z) - Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification [10.578682558356473]
hierarchical text classification (HTC) suffers a poor performance when low-resource or few-shot settings are considered.
In this work, we propose the hierarchical verbalizer ("HierVerb"), a multi-verbalizer framework treating HTC as a single- or multi-label classification problem.
In this manner, HierVerb fuses label hierarchy knowledge into verbalizers and remarkably outperforms those who inject hierarchy through graph encoders.
arXiv Detail & Related papers (2023-05-26T12:41:49Z) - HaGRID - HAnd Gesture Recognition Image Dataset [79.21033185563167]
This paper introduces an enormous dataset, HaGRID, to build a hand gesture recognition system concentrating on interaction with devices to manage them.
Although the gestures are static, they were picked up, especially for the ability to design several dynamic gestures.
The HaGRID contains 554,800 images and bounding box annotations with gesture labels to solve hand detection and gesture classification tasks.
arXiv Detail & Related papers (2022-06-16T14:41:32Z) - HPT: Hierarchy-aware Prompt Tuning for Hierarchical Text Classification [45.314357107687286]
We propose HPT, a Hierarchy-aware Prompt Tuning method to handle HTC from a multi-label perspective.
Specifically, we construct dynamic virtual template and label words which take the form of soft prompts to fuse the label hierarchy knowledge.
Experiments show HPT achieves the state-of-the-art performances on 3 popular HTC datasets.
arXiv Detail & Related papers (2022-04-28T11:22:49Z) - HTCInfoMax: A Global Model for Hierarchical Text Classification via
Information Maximization [75.45291796263103]
The current state-of-the-art model HiAGM for hierarchical text classification has two limitations.
It correlates each text sample with all labels in the dataset which contains irrelevant information.
We propose HTCInfoMax to address these issues by introducing information which includes two modules.
arXiv Detail & Related papers (2021-04-12T06:04:20Z) - KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization
and Completion [99.47414073164656]
A comprehensive knowledge graph (KG) contains an instance-level entity graph and an ontology-level concept graph.
The two-view KG provides a testbed for models to "simulate" human's abilities on knowledge abstraction, concretization, and completion.
We propose a unified KG benchmark by improving existing benchmarks in terms of dataset scale, task coverage, and difficulty.
arXiv Detail & Related papers (2020-04-28T16:21:57Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.