Revisiting Hierarchical Text Classification: Inference and Metrics
- URL: http://arxiv.org/abs/2410.01305v2
- Date: Fri, 11 Oct 2024 15:44:28 GMT
- Title: Revisiting Hierarchical Text Classification: Inference and Metrics
- Authors: Roman Plaud, Matthieu Labeau, Antoine Saillenfest, Thomas Bonald,
- Abstract summary: Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy.
Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such.
We propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method.
- Score: 4.057349748970303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC. Code implementation and dataset are available at \url{https://github.com/RomanPlaud/revisitingHTC}.
Related papers
- Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification [34.06292178703825]
We introduce the first ICL-based framework with large language models (LLMs) for few-shot HTC.
We exploit a retrieval database to identify relevant demonstrations, and an iterative policy to manage multi-layer hierarchical labels.
We can achieve state-of-the-art results in few-shot HTC.
arXiv Detail & Related papers (2024-06-25T13:19:41Z) - Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization [63.66349334291372]
We propose a framework with Meta prompt and Instance Contrastive learning (MIC) schemes.
Firstly, we simulate a novel-class-emerging scenario to help the prompt that learns class and background prompts generalize to novel classes.
Secondly, we design an instance-level contrastive strategy to promote intra-class compactness and inter-class separation, which benefits generalization of the detector to novel class objects.
arXiv Detail & Related papers (2024-03-14T14:25:10Z) - Utilizing Local Hierarchy with Adversarial Training for Hierarchical Text Classification [30.353876890557984]
Hierarchical text classification (HTC) is a challenging subtask due to its complex taxonomic structure.
We propose a HiAdv framework that can fit in nearly all HTC models and optimize them with the local hierarchy as auxiliary information.
arXiv Detail & Related papers (2024-02-29T03:20:45Z) - Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls
and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph.
A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task.
New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z) - Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification [10.578682558356473]
hierarchical text classification (HTC) suffers a poor performance when low-resource or few-shot settings are considered.
In this work, we propose the hierarchical verbalizer ("HierVerb"), a multi-verbalizer framework treating HTC as a single- or multi-label classification problem.
In this manner, HierVerb fuses label hierarchy knowledge into verbalizers and remarkably outperforms those who inject hierarchy through graph encoders.
arXiv Detail & Related papers (2023-05-26T12:41:49Z) - Constrained Sequence-to-Tree Generation for Hierarchical Text
Classification [10.143177923523407]
Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy.
In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure.
arXiv Detail & Related papers (2022-04-02T08:35:39Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - HTCInfoMax: A Global Model for Hierarchical Text Classification via
Information Maximization [75.45291796263103]
The current state-of-the-art model HiAGM for hierarchical text classification has two limitations.
It correlates each text sample with all labels in the dataset which contains irrelevant information.
We propose HTCInfoMax to address these issues by introducing information which includes two modules.
arXiv Detail & Related papers (2021-04-12T06:04:20Z) - Evaluating Large-Vocabulary Object Detectors: The Devil is in the
Details [107.2722027807328]
We find that the default implementation of AP is neither category independent, nor does it directly reward properly calibrated detectors.
We show that the default implementation produces a gameable metric, where a simple, nonsensical re-ranking policy can improve AP by a large margin.
We benchmark recent advances in large-vocabulary detection and find that many reported gains do not translate to improvements under our new per-class independent evaluation.
arXiv Detail & Related papers (2021-02-01T18:56:02Z) - Small but Mighty: New Benchmarks for Split and Rephrase [18.959219419951083]
Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones.
We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues.
We show that even a simple rule-based model can perform on par with the state-of-the-art model.
arXiv Detail & Related papers (2020-09-17T23:37:33Z) - Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task.
Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.